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This  Ph.D.  thesis  is  an  important  contribution  to  a  new  dimension  to 
statistical  reasoning  for  which  I  propose  the  name  FUN  STAT  (because  it  is 
fun;  functional  (useful);  based  on  functional  analysis;  estimates  functions; 
and  graphs  functions).  FUN.STAT  has  three  important  components:  quantile 
and  density-quantile  signatures  of  populations,  entropy  and  information 
measures,  and  functional  inference. 

The  joint  density  quantile  function  of  (X , Y)  where  X  and  Y  are  jointly 
continuous  random  variables  can  be  represented 

fQx  y(Uj,u2)  *  fx  y(Qx(U|),  Qy(u2))  “  fQx(uJ)fdy(u2)d(u1 ,u2) 

in  terms  of  the  marginal  density-quantile  functions  fQx(u),  fQy(u),  and  the 
dependence  density  d(uj,u2).  How  these  three  functions  can  be  semi- 
automatical  ly  estimated,  by  autoregressive  or  exponential  model  estimators 
with  maximum  entropy  properties,  is  investigated  in  this  thesis.  The  results 
provide  important  and  useful  procedures  for  nonparametr Jc  bivariate  density 
estimation.  The  thesis  discusses  estimators  of  the  entropy  H(d)  of  d(uj,u2), 
which  seem  to  me  to  be  Important  because  they  can  be  applied  to  provide  a 
useful  quality-index  for  projection-pursuit  data  analysis  methods. 
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1.  INTRODUCTION 

1 . 1  The  Problem 

Much  of  statistical  analysis  revolves  around  the 
interrelationship  between  random  variables.  One  may  explore  cause  and 
effect  relationships,  investigate  the  covariance  structure  of  a 
collection  of  random  variables,  or  attempt  to  discover  the  underlying 
probability  mechanism  that  produces  a  vector  of  random  variables.  The 
areas  of  regression  and  correlation  analysis,  analysis  of  variance, 
categorical  data  analysis,  and  the  genera)  area  of  multivariate 
analysis  attempt  to  confront  some  of  the  relevant  problems  in  dealing 
with  relationships  among  random  variables.  Mathematical  tools  from 
probability  theory  and  the  theory  of  vector  spaces  assist  in  analyzing 
the  abstract  problem,  but  one  must  also  overcome  computational 
difficulties  that  arise  from  examining  discrete  observations  from  a 
continuous  multivariate  distribution.  The  esoteric  nature  of 
statistical  analysis  results  from  the  wide  range  of  mathematical  and 
computational  tools  that  must  be  employed  in  solving  general  data 
analytic  problems.  In  this  work  we  attempt  to  consolidate  a  variety 
of  such  tools  to  provide  a  solid  base  from  which  to  attack  the  general 
problem  of  multivariate  data  analysis.  We  have  chosen  bivariate  data 
modeling  as  the  logical  starting  point,  and  that  is  the  primary 
subject  of  this  thesis;  however,  multivariate  generalizations  will  be 


This  dissertation  will  follow  the  format  for  the  Journal  of  the 
American  Statistical  Association. 
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suggested  whenever  appropriate. 

In  its  most  general  form,  the  problem  is  to  infer  from  a 

bivariate  random  sample  {(X..Y.),  i-l . n}  the  nature  of  the  joint 

cumulative  d i str i but i on  f unct i on  (c.d.f.) 

Fx  y(x,y)  -  P (X<x, Y£y) 

and  the  marq i na I  cumu I  at i ve  d i str i but i on  funct i ons 

Fx  (x)  -  P  (XSx)  .  Fy  (y)  -  P  (Y<y)  . 

Knowledge  of  these  entities  will  answer  most  of  the  questions  posed  in 
regression  and  correlation  analysis,  but  more  generally,  information 
will  be  provided  about  the  dependence  structure  between  the  random 
variables  X  and  Y.  The  theory  of  parametric  inference  is  based  on 
assumptions  concerning  these  functions,  but  one  is  still  faced  with 
the  problem  of  testing  these  assumptions.  We  will  emphasize  the 
problem  of  determining  the  dependence  structure  between  two  random 
variables,  but  our  approach  will  lead  to  solutions  to  more  general 
problems  of  bivariate  data  analysis. 

A  specific  application  will  be  to  provide  techniques  for  testing 
the  null  hypothesis 


Hq:  X  and  Y  are  independent 


against  some  suitable  alternative.  Many  useful  techniques  already 
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exist  for  handling  this  problem,  but  often  such  techniques  carry 
restrictive  assumptions  or  demand  too  much  computational  complexity. 

We  will  propose  a  general  technique  carrying  few  restrictions  that  is 
computationally  manageable  and  that  suggests  applications  to  other 
areas  of  bivariate  data  modeling. 

1.2  Survey  of  the  Literature 

A  wide  variety  of  sources  exist  from  which  to  extract  useful 
information  for  attacking  the  problem  of  bivariate  data  analysis.  For 
general  nonparametr ic  measures  of  association,  Lehmann  ( 7 966)  , 
Blomqvist  (1950),  Blum,  Kiefer,  and  Rosenblatt  O96I) ,  and  Hoeffding 
( 7 948)  provide  useful  fundamental  information.  Puri,  Sen,  and  Gokhale 
(1970)  and  Parzen  (1977)  contain  useful  discussions  of  independence 
tests  in  a  multivariate  setting.  Classical  normal  theory  is 
exemplified  in  Morrison  (1976)  with  more  theoretical  results  appearing 
in  Kshirsagar  (1972)  and  Rao  ( 1 973) - 

The  approach  we  employ  considers  function  approximation  in  a 
nonparametr i c  setting  using  information  measures.  References  on 
nonparametr i c  density  estimation  are  provided  by  Rosenblatt  (1956) , 
Parzen  (1962),  Cacoullos  (1966)  ,  Lof tsgaarder.  and  Quesenberry  (1969), 
Kronmal  and  Tartar  (1968) ,  Tartar  and  Kronmal  (1970,1976),  Crain 
0971*)  .  Carmichael  (1976),  and  Good  and  Gaskins  (I98O)  .  General 
expository  and  bibliographic  sources  are  provided  by  Tapia  and 
Thompson  ( 1 978) ,  Bean  and  Tsokos  (1980),  Wertz  ( 1 978) ,  and  Silverman 
(1980).  For  the  mathematical  theory  of  function  approximations. 
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Lanczos  (1956).  Rainville  (i960),  Davis  (1975) •  and  Powell  (1981)  are 
useful  texts.  Loeve  (1977).  Hewitt  and  Stromberg  (1965).  and  Royden 
(1968)  contain  some  results  from  functional  analysis  that  may  also  be 
applied  to  this  problem  in  a  measure  space  setting.  Parzen 
(1959.1961)  also  contains  some  useful  function  space  results  in  a 
statistical  setting.  Shannon  (1948)  and  Kullback  (1978)  are  the 
fundamental  references  for  information  theory. 

The  motivation  for  much  of  this  research  is  provided  by  Parzen 
0977.1979).  Kimeldorf  and  Sampson  (1975b).  Crain  (197M  .  and  Tartar 
and  Kronmal  (1970).  The  quantile  domain  approach  to  statistical  data 
modeling  found  in  Parzen  (1979b)  provides  some  useful  solutions  that 
can  be  extended  to  bivariate  data  analysis,  with  Kimeldorf  and  Sampson 
(1975b)  providing  some  useful  bivariate  theory  to  apply  to  the 
problem.  The  orthogonal  expansion  technique  as  a  method  of 
nonparametr ic  density  estimation  seems  to  be  the  best  suited  for 
multivariate  extension  of  univariate  methods.  The  ideas  of  Crain 
(1974)  and  Tartar  and  Kronmal  (1970)  motivate  the  development  of  a 
modification  of  their  techniques  based  on  a  general  regression 
framework  using  information  theoretic  notions. 

Scott,  et  aj..  (1978),  employ  the  bivariate  kernel  method  to  a 
set  of  coronary  heart  disease  data.  Tartar  and  Silvers  (1975)  apply 
orthogonal  expansion  techniques  to  the  problem  of  bivariate  Gaussian 
mixture  decompositions.  These  applications  suggest  a  need  for  a  more 
objective  and  less  cumbersome  approach  to  the  problem  of  diagnosing 
the  shape  of  a  bivariate  density,  which  motivates  the  applications 
considered  in  the  present  work. 
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2.  MATHEMATICAL  AND  STATISTICAL  FUNDAMENTALS 

2.1  Introduction 

The  approach  to  bivariate  data  modeling  that  we  will  develop  in 
Chapter  k  is  motivated  by  three  concepts:  l)  the  use  of  quantile 
based  data  analytic  tools;  2)  the  use  of  information  and  entropy 
criterion  functions;  and  3)  the  application  of  some  powerful  results 
from  approximation  theory.  This  chapter  provides  expository 
information  on  these  subjects  along  with  a  few  remarks  and 
observations  that  may  not  be  found  in  the  literature.  An  additional 
section  is  included  describing  some  elements  of  stochastic  processes 
and  complex  regression  applicable  to  the  models  employed  in  Chapter  k. 
We  have  assumed  knowledge  of  basic  mathematical  statistics  similar  to 
that  found  in  Rao  0973)  . 

2.2  Uniform  Representations  and  the  Probability  Integral  Transform 

In  this  section  we  introduce  concepts  of  probability  modeling  set 
in  the  quant i le  domain,  i.e.t  a  domain  of  consideration  in  which  the 
quant i le  funct i on  is  the  fundamental  entity.  The  foundation  of  much 
of  the  theory  will  be  directly  or  indirectly  related  to  the 
probabi I i ty  i nteora 1  transform  which  makes  the  quantile  approach  so 
appea ling. 

Let  X  be  a  random  variable  (r.v.)  with  c.d.f.  F  and  probability 
density  function  (p.d.f.)  f.  Define  the  quant i le  function  Q(u)  of  X 
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by 


Q(u)-F'1  (u)-inf  {x:  F  (x)  £u}  ,0<u£  1  .  (2.2.1) 

When  two  or  more  r.v.'s  are  considered,  one  affixes  a  subscript  and 
denotes  Q  (u)  as  the  quantile  function  of  X,  etc.  This  definition  of 
a  quantile  function  yields  a  result  known  as  the  correspondence 
i dent i ty ,  name 1 y 


F  (x)£u  iff  Q (u) Sx.  (2.2.2) 

(The  expression  iff  is  the  commonly  used  mathematical  abbreviation  for 
"if  and  only  if".)  When  F  is  continuous,  one  has  the  i nvcrse  i dent i ty 

FQ  (u)  -  F  (Q  (u) )  -  u.  (2.2.3) 

Differentiating  the  inverse  identity,  one  obtains  the  reciprocal 
identi tv 


f  Q  (u)  q  (y)  •  1.  (2.2.M 

The  notation  fQ(u)  refers  to  the  densi ty-auanti le  function  defined  to 
be  the  composite  function  f(Q(u)).  One  also  has  the  quant i I  e-dens i tv 
function  q  (u)  defined  to  be  the  derivative  of  the  quantile  ‘unction. 
Another  useful  function  is  the  negative  of  the  derivative  of  fr  <) , 
often  called  the  score  function,  given  by 
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J  (u)  -  -f'Q(u)q(u)  .  (2.2.5) 

The  score  function  is  usually  written 

J(u)  -  -f  *Q(u)/fQ(u)  .  (2.2.6) 

Randles  and  Wolfe  (1979b)  call  J  (u)  the  optima  I  score  function.  One 
appl ication  for  J  (u)  involves  the  concept  of  information.  Consider  a 
form  of  Fisher's  information, 

1  (f)  -  /"[|^  log  f(x)]2  f  (x)  dx 

—  oo 

-  rliiteUi dx  *  r1 1 j cu>  i *<ju.  (2.2.7) 

f(x)  o 

Thus,  this  information  measure  requires  only  knowledge  of  the  score 
function.  For  Shannon  entropy,  the  densi ty-quanti le  function  is  the 
fundamental  object,  namely 

H(f)  ■  f°°-[ log  f(x)]f(x)  dx 

—  OO 

,1 

■  /  -log  fQ (u)  du.  (2.2.8) 

0 

With  the  quantile  building  blocks  considered  above,  one  may  now 
state  two  fundamental  theorems  that  will  be  exploited  later. 


Theorem  2.2.1  Let  the  r.v.  1)  be  distributed  uniformly  on  the 
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interval  [0,1],  and  let  F  be  a  c.d.f.  Define  the  r.v.  X  by  X»Q (U) 
where  Q  is  the  quantile  function  associated  with  F.  Then  the  c.d.f. 
of  X  is  F. 

Proof;  P  (X£x)  -P[Q  (U)  £x]  *P[U£F  (*)  ]»F  (x)  .■ 

Theorem  2.2.2  Let  X  be  a  r.v.  with  continuous  c.d.f.  F.  Then 
U»F  (X)  is  a  uniform  r.v.  on  the  interval  [0,1]  (In  the  proof,  Q(u)  is 
the  quantile  function  of  X.) 

Proof;  P  (U£u)  **P[F  (X)  Su]  -P[X2Q  (u)  ]-P[X>Q  (u)  ] 

«1-P[X£Q  (u)]-l-FQ(u)-J-u.e 

Details  using  some  of  the  aforementioned  identities  are  omitted 
from  the  above  proofs  but  may  easily  be  supplied  by  the  reader.  The 
transformation  U«F  (X)  is  called  the  probabi 1 i ty  i nteoral  transform  and 
is  very  useful  in  attacking  general  problems  in  such  a  way  that  only 
uniform  distributions  need  be  considered.  The  probability  integral 
transform  also  reduces  the  general  simulation  problem  to  one  of 
simulating  uniform  [0,1]  random  variables.  One  calls  U  the  uniform 
representation  of  X.  This  terminology  will  become  more  meaningful  in 
the  bivariate  case.  In  the  univariate  case,  any  continuous  random 
variable  has  the  same  uniform  representation.  The  usefulness  occurs 
when  results  are  invariant  to  the  probability  integral  transform. 

Moments  may  be  considered  in  the  quantile  domain  by  applying  some 
of  the  results  obtained  above.  Observe, 
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1 

y  -  E(x)  -  £ [Q  (U)  ]  -  /  Q (u)  du  (2.2.9) 

o 


from  Theorem  2.2.1,  and 


i 

a2  -Var(X)  -  f  [Q(u)-y]ldu.  (2.2.10) 

Jo 


Another  application  involves  transformations  of  the  form  Y»g(X) 
where  X  has  a  known  distribution.  A  common  transformation  is  the 
location-scale  transformation 


Y  -  y  +  <jX. 


(2.2.11) 


One  is  given  Qx(u)  and  wishes  to  obtain  Qy(u).  Observe, 


£  y(y)  »P  (YSy)  -p  (y+cXSy)  -P[XS  (y-y)  /a]-Fx  [  (y-y)  /a] 


Fy  (y)  t  u  iff  Q Y (u)  £  y 


is  equivalent  to 


FxC(y-y)/a3  i  u  iff  Qx(u)  £  (y-y) /a. 


Qx(u)  £  (y-y) /a  iff  y+aQx(u)  £  y. 


JET. 


Furthermore, 


Hence,  it  follows  that 


Qy(u)  -  v  +  ctqx  (u)  .  (2.2.12) 

One  may  seek  similar  results  for  general  transformations  Y«*g  (X)  .  Let 
g  be  a  strictly  increasing  function.  Then, 

F Y  (Y)  -P  C YSy]  -P  [ g  (X)  Zy]  -P  [XSg  - »  (y)  ]  -F  x  [g  -  1  (y)  J  . 

Again,  the  correspondence  identity  for  Y  is  equivalent  to 
FxCg_1(y)]  i  u  iff  Qx (u)  s  g-My) 

and 

Qx (u)  s  g'l(y)  iff  g[Qx (u) ]  s  y. 

Thus, 

QyCu)  -  g[Qx(u)].  (2.2.13) 

How.,  suppose  g  is  strictly  decreasing.  It  follows  readily  that 

Fy (y)  *  '  -  FxCg*‘(y)]. 

The  correspondence  identity  for  Y  is  equivalent  to 
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I 


FxCfl‘My)]  S  l-u  iff  Qx ( 1  -u)  2  g- 1  (y) 


and 


Qx  ( 1  -u)  2  g-My)  iff  g[Qx(l-u)]  s  y. 


Thus, 


Qy(u)  -  fl[Qx(l-u)].  (2.2.14) 

Par2en  (1979b)  considers  the  general  problem  of  transformations  of 
random  variables  to  specified  distributions  (such  as  normal)  in  light 
of  the  above  results. 

borne  useful  extensions  to  these  concepts  may  be  applied  to 
goodness-of-f i t  (g.o.f.)  procedures,  if  one  defines  D(u)“FQ(u)  and 
d(u)»0‘(u),  these  represent  the  c.d.f.  and  p.d.f.  respectively  of  a 
uniform  (0,1)  random  variable.  For  a  null  hypothesis 
Hq:  fQ (u) «fQQo (u) ,  Parzen  (1979b)  calls 

d(u)  -  f  0  (u)q(u)/f  f  0  (u)  q  (u) du  (2.2.15) 

O  O  4  o  o  o 

the  loflo  transformation  dens i ty  which  is  a  uniform  (0,1)  density  under 
the  null  hypothesis.  The  statistical  applications  of  (2.2.15)  will  be 
considered  in  section  3*5  in  the  discussion  of  autoregressive  density 
estimators.  Parzen  also  discusses  tai I -exponents  as  a  meant  of 
classifying  distributions  based  on  density-quantile  representations. 
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The  reader  may  consult  Parzen  0979b)  for  more  extensive  results  in 
the  univariate  theory. 

Some  of  the  above  univariate  concepts  extend  readily  to  the 

bivariate  case.  Let  X  and  Y  be  continuous  r.v.'s  with  joint  c.d.f. 

F  and  marginals  Fy  and  Fv.  Let  the  respective  quantile  functions 
X,  Y  X  Y 

be  denoted  by  and  Q^.  Define  the  dependence  distribution  function 
0(u]tu2)  by 


0(ultu2)  *  ,Qy(°2^  ’  0Sui*U251#  (2.2.16) 

Parzen  (1977)  calls  Otu^u^  the  regression  distribution  function. 

while  Kimeldorf  and  Sampson  0975b)  call  it  the  uniform  representation 

of  F  (x.y) .  The  dependence  density  d(u^,u  )  is  given  by 
X .  Y 


32 

d(u  u  )  -  — —  0(uru2) 

dUj  du2 


fX,Y^X^Ul^  MU2^ 
fX(<lX(ul))fY(<iY(u2)) 


(2.2.17) 


Note  that  while  the  univariate  representat ions  of  the  above  objects 
are  related  to  the  uniform  (0,1)  distribution  and  hence  have 
extensions  to  goodness-of -f i t  procedures,  the  dependence  distribution 
function  and  the  dependence  density  have  the  added  bivariate  role  of 
detecting  independence  between  two  random  variables,  justifying  the 
name  we  have  given  them.  Furthermore,  they  correspond  to  bivariate 
r.v.'s  distributed  uniformly  on  the  unit  square  only  when  X  and  Y  are 
independent,  that  is,  D (Uj ,u2) -UjU2  and  d(uJfu2)-l  if  and  only  if  X 
and  Y  are  independent.  More  general  bivariate  uniform  distributions 


14 


are  considered  by  Kimeldorf  and  Sampson  (1975b) . 

Since  the  bivariate  normal  distribution  is  usually  the  "null 
hypothesis"  distribution  one  may  be  interested  in  the  shapes  of  the 
various  functions  of  interest.  Figure  1  depicts  a  bivariate  standard 
normal  p.d.f.  with  the  correlation  coefficient  equal  to  2ero,  while 
Figure  2  shows  the  linear  concentration  of  the  probability  mass  when 
the  correlation  coefficient  is  equal  to  .9.  Figure  3  shows  the 
dependence  density  corresponding  to  Figure  2,  while  for  the 
independence  case,  the  dependence  density  is  a  flat  surface 
identically  equal  to  one.  Figures  4  and  5  show  the  bivariate 
density-quantile  functions  corresponding  to  Figures  1  and  2.  Figures 
3  through  5  have  not  been  observed  in  the  literature  although  they 
contribute  insight  into  the  relationships  between  the  various 
functions  of  interest. 

One  may  establish  an  equivalence  relation  based  on  the  above 
uniform  representation.  Two  bivariate  distribution  functions  Fx  y  and 
6^  Y  are  said  to  be  equivalent  (written  F^  1 ^  where  the 

subscript  refers  to  the  distribution  for  which  the  uniform 
representation  is  defined.  Thus,  all  bivariate  distribution  functions 
of  independent  random  variables  are  equivalent  in  this  sense.  The 
following  Theorem  allows  one  to  apply  this  concept  to  generating  new 
distributions  with  arbitrary  prescribed  marginals. 

Theorem  2.2.3  (Kimeldorf  and  Sampson,  1975b)  Let  F^  y,  G^  y  be 
bivariate  distribution  functions  with  associated  marginals  F^,  Fy,  and 
G^,  Gy  and  corresponding  quantile  functions  F^*1,  Fy1  and 
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Figure  2.  Bivariate  Standard  Normal  P.D.F.  for  Rho*0.9 
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Then 


FX,Y^X,Y  lff  GX,Y(x,y)"FX,Y[-Fx'lGX(x)  ,FY  l<3Y(y):j ' 


(2.2.18) 


Example  2.2.1  Let  $(•)  be  the  standard  normal  c.d.f.  Then  for  p 
satisfying  0<p<1, 

F  v(x.y)  ■  $(x)  $(y)  +  [l-*(x)][l-*(y)] 

A  ,  I 

{  (mi n[  (1-  $(x) ) ,  (1-  $(y) )  ])p -1}  (2.2.19) 

has  standard  normal  marginals.  This  c.d.f.  is  derived  using  the 
bivariate  uniform  c.d.f 

D  (u j , Uj)  ■  Uju2  +  ( 1  ~u  i>  (1-u2) 

{(minCO-u,)  ,  (l-u2)])P-1J  (2.2.20) 

and  taking  advantage  of  (2.2.18).  Mardia  (1970)  discusses  this  c.d.f. 
in  connection  with  tne  bivariate  exponential  distribution  proposed  by 
Marshall  and  Olkin  (1967). 

Kimeldorf  and  Sampson  ( 1975a)  discuss  one-parameter  families  of 
bivariate  distributions  in  light  of  Theorem  2.2.3*  G^  y  of  (2.2.18) 
is  called  a  (G.^.G..)  -  translate  of  y.  Ideally,  one-parameter 
families  of  bivariate  distributions  exhibit  a  parameter  that  provides 


W-V-. ; 


some  measure  of  association  between  the  random  variables.  Such  is  the 
case  for  the  one-parameter  family  of  bivariate  standard  normal 

distributions  (i.e.,u  -0,a  -a  *1) .  Kimeldorf  and  Sampson  base 

x  y  x  y 

their  discussion  on  uniform  representat i ons  of  bivariate 
distributions. 

The  uniform  density  dfuj.Uj)  defined  by  (2.2.17)  also  has 
applications  to  regression  problems.  Using  the  definition  of  E(Y|X), 
one  may  write  the  equivalent  expression 

1 

E[Y|X«Qx(u  )]  -  J  QY(u2)d(u1tu2)  du2  (2.2.21) 

by  using  the  change  of  variable  X*Qx(Uj)  and  Y-QY(U2)  where  Uj  and  U 
are  (possibly  dependent)  uniform  (0,1)  random  variables.  Equation 
(2.2.21)  justifies  Parzen's  use  of  the  term  regression  density  for 
d(Uj,u2).  Furthermore,  this  representation  suggests  applications  to 
nonparametr ic  regression  which  will  be  considered  in  Chapter  4. 

One  may  also  consider  general  quantile  representations  of 
conditional  probability  functions.  Corresponding  to  the  conditional 
c.d.f.  Fy|x  (ylx)  is  a  conditional  quantile  function  Qy | x  <U I x)  defined 
by  (2.2.1).  Parzen  (1977)  uses  the  usual  change  of  variable  to  obtain 

FY  |X  (ylx)  “  J0U2d(Fx(x)  ’u2}  dU2  (2.2.22) 

where  u^—F^  (y) .  He  then  expresses  Qy | ^ (u j x)  in  terms  of  the 
unconditional  quantile  function  evaluated  at  an  inverse 

representation  of  the  right  hand  side  of  (2.2.22) .  One  would  prefer  a 
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simpler  expression  conducive  to  estimation  from  sample  data,  but 
Parzen  suggests  estimators  for  Qyjg  whose  properties  remain  to  be 
investigated.  The  value  of  the  conditional  quantile  function  is 
illustrated  in  the  following  application. 

To  generate  a  univariate  random  sample  of  size  n  with  specified 
c.d.f.  F,  one  generates  a  uniform  random  sample  . and  forms 

X.  -  Q (U  .)  ,  i-1 . n,  (2.2.23) 

where  Q  is  the  quantile  function  corresponding  to  F.  Theorem  2.2.1 
guarantees  that  the  sample  has  common  c.d.f.  F.  The  extension  of  this 
approach  to  the  bivariate  case,  however,  is  not  obvious.  The  usual 
approach  is  to  generate  collections  of  random  variables  V|’V2‘*‘”Vk 
and  form 

X-a(V  ,V . Vk),  Y-h(V],V2 . Vfc)  (2.2.2k) 

so  that  the  appropriate  distribution  theory  guarantees  that  X  and  Y 
have  specified  joint  c.d.f.  F  .  This  entails  generating  kn  random 

A  ,  T 

variables  to  obtain  2n  random  variables.  Furthermore,  the  V  values 
usually  are  transformed  uniform  values  based  on  (2.2.23)  so  that  the 
simulation  problem  becomes  unreasonably  complicated.  In  some  cases, 
the  appropriate  distribution  theory  does  not  exist  to  generate  the 
desired  random  sample.  To  overcome  this  problem,  one  may  develop  a 
general  procedure  based  on  the  conditional  quantile  function. 


let  and  be  independent  uniform  (0,1)  random  variables. 
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Specify  joint  c.d.f.  F^  y  and  form 

X-QX(U1)  ,  Y-QY|X(U2|X)  (2.2.25) 

where  the  quantile  functions  correspond  to  the  choice  of  c.d.f.  F^  y 
Then  X  and  Y  have  specified  joint  c.d.f.  y.  To  generate  a  sample 
of  size  n,  one  merely  generates  two  independent  samples  of  uniform 
random  variables  and  uses  (2.2.25)  to  form  the  corresponding  bivariate 
sample  with  c.d.f.  F  .  Kennedy  and  Gentle  (1980)  consider  a  variety 

A  ,  Y 

of  techniques  for  generating  uniform  (0,1)  random  variables  and 
discuss  the  design  of  Monte  Carlo  experiments.  Such  techniques  will 
be  applied  in  Chapter  4. 

One  attempts  to  estimate  the  above  quantities  with  statistics 
that  possess  desirable  properties.  An  important  sample  object  useful 
in  developing  statistics  of  interest  is  the  cmo i r i ca I  distribution 
function  (e.d.f.)  F  (x)  defined  by 

F  (x)  ■  (1/n) {no.  of  data  points  S  x} .  (2.2.26) 

n 

Formally,  one  assumes  a  collection  X,,X _,...,X  of  i.i.d.  r.v.'s, 

1  2  n 

i.e.,  a  random  sample  of  size  n,  and  defines  the  empirical  c.d.f. 

F  (x)  by 
n 

n 

Fn(x)  -  (1/n)  l  lA(X.)  (2.2.27) 

i  “1 


where  A*  (-<*>,  x]  and 
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lA(t) 


1  if  teA, 
0  if  t£A, 


(2.2.28) 


is  the  indicator  function.  One  observes  immediately  that  F  (x) 
-  -  n 

satisfies  the  properties  of  a  c.d.f.,  i.e.,  nondecreasing,  continuous 

from  the  right,  OiF  (x)£1,  F  (-oo)-O,  and  F  («)“1.  Furthermore,  F  (x) 
n  n  n  n 

may  be  thought  of  as  a  stochest i c  process  although  presently  attention 

is  paid  to  F  (x)  for  fixed  x. 
n 

We  now  briefly  state  some  important  properties  of  F  (x) .  Let 

n 

F  (x)  be  the  true  population  c.d.f.  generating  the  data.  Then 


E  [F  (x)]  -  F  (x)  ,  Var  [F  (x)  ]  -  F  (x)  [1-F  (x)  ]/n.  (2.2.29) 

n  n 

Thus,  F  (x)-*-F  (x)  as  n-*»  in  quadratic  mean,  or  F  (x)  is  consistent  in 
n  n 

mean  square  for  estimating  F  (x)  . 

The  representation  (2.2.27)  also  permits  direct  application  of 

the  Strong  Law  of  Large  Numbers  (SLLN)  and  the  Central  Limit  Theorem 

(CLT) .  One  notes  that  nFn  (x)  is  exactly  binomially  distributed  with 

parameters  (n,F(x))  ,  which  makes  it  easy  to  deduce  that  Fn(x)  is 
# 

strongly  consistent  for  estimating  F  (x)  and  that  Fn  (x)  suitably 
standardized  is  asymptotically  normally  distributed.  Note  that  these 
results  pertain  to  the  poi ntwise  estimation  of  F (x)  by  F„  (x) .  Global 
measures  characterizing  the  "closeness"  of  Fp  in  approximating  F  may 
be  found  in  Durbin  (1973)  with  important  asymptotic  results  stated 
therein. 

The  Lebesgue-Stiel tjes  integral  w.r.t.  the  empirical  c.d.f.  is 


25 


often  employed  to  obtain  method  of  moments  estimates  for  the 
corresponding  population  parameter.  For  example,  for  the  parameter 
def i ned  by 

U  -  r  x  dF  (x) ,  (2.2.30) 

-00 

the  corresponding  method  of  moments  estimator  X  may  be  represented  by 

n 

Xn  -  r  x  dFn  W  •  (2.2.31) 

—00 

This  approach  to  obtaining  estimators  has  many  applications.  Some  of 
the  information  parameters  of  the  next  section  will  be  estimated  using 
this  approach. 

An  empirical  function  fundamental  to  the  quantile  approach  is  the 
empirical  quant i le  function  given  by 

Qn(u)  -  F-l(u)  -  infix:  Fn(x)iu>  (2.2.32) 

Upon  closer  inspection  one  realizes  that  (2.2.32)  is  equivalent  to 

Qn(u)  *  Xqj  for  (j-1) /n<uSj/n,  j«l,...,n,  (2.2.33) 

where  is  the  j-th  order  statistic  of  the  random  sample.  Parzen 

(1979b)  suggests  that  Qp(0)  be  taken  to  be  a  natural  minimum  when  one 
is  available.  If  Q(u)  is  the  true  population  quantile  function,  one 
may  define  the  sample  quant i 1 e  process  by 


26 


An(u)  »  •/n[Qn(u)-Q(u)].  OSuS).  (2.2.3*) 

The  following  results  may  be  found  in  Caorgd  and  Revesz  ( 1 98 1 ) . 

Theorem  2.2.4  Let  u,  0<u£l,  be  given.  Let  F  (x)  be  absolutely 
continuous  in  an  interval  about  Q(u),  and  let  fQ(u)  be  positive  and 
continuous  at  u.  Then  as  n-*°®. 

_ _ _  d 

fQ (u)  A  (u)  //u(I-jJ  -*■  MiO.I).  (2.2.35) 

n 


Theorem  2.2.5  Let  Q(u)  be  continuous  at  u.  Then  as  n-w>, 


Q  (u)  Q  (u)  . 
n 


(2.2.36) 


Further  results  are  given  in  Csorgb  and  Revesz  (1981) .  Serf  ling 
(1980)  also  provides  similar  results  for  the  sample  quantile  function. 
No re  general  results  treat  *n  (u)  as  a  stochastic  process  and  exhibit 
the  left  hand  side  of  (2.2.35)  a*  converging  in  distribution  to  a 
Brownian  Bridge  stochastic  process.  The  definition  of  a  Brownian 
Bridge  is  given  in  Section  2.5*  but  for  now.  one  notes  that  the 
results  of  Theorem  2.2.4  may  be  generalized  to  conclude 

d 

fQ(u)An(u)  -*■  &(u)  ,  for  all  u. 


(2.2.37) 
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where  B  (u)  is  a  Brownian  Bridge  process. 

One  may  prefer  to  use  the  piecewise  linear  definition  of  the 
sample  quantile  function  given  by 


Q  (u)  -  n[(j/n)-u]X^._jj  +  n[u- (j-1) 'h]X  ^  , 

(j-1)  /nSu5j/n,  j-1 , ... ,n,  (2.2.38) 

or  the  shifted  piecewise  linear  version 


Q  (u)  -  n[(j+.5)/n-u]X  +  n[u- (j-.5) /n]X  , 

(j)  (j+U 

( j - . 5) /r*£u£ { j+ .  5)  /n .  j  - 1 . n.  (2.2.39) 

Using  definition  (2.2.39)  suggests  that  the  empirical  quantile-density 
be  def i ned  by 

q n(u)  »  n(X(j+])  -X(j) )  ,  (j-.5)/n<u<(j+.5)/n,  j-1 . n-1.  (2. 2. 1*0) 

This  definition  of  qn(u)  is  the  derivative  of  (2.2.39).  For  any 

definition  of  Q  (u) ,  one  may  take  the  corresponding  q  (u)  to  be  the 
n  n 

raw  derivative 


1 


dn(u)  -  [Qn(u+h)  -Qn(u-h)]/(2h)  ,  0<u<l,  (2.2.41) 

where  h-h  (n)  is  some  predetermined  positive  function  of  n.  Bloch  and 


I 

i 


Gastwirth  (1968)  use  (2.2.41)  corresponding  to  Q  (u)  defined  by 

(2.2.33)*  Vasicek  (1976)  applies  this  definition  to  obtain  the  g.o.f. 

test  of  normality  discussed  previously.  Observe  that  for  any  well 

defined  <|  (u)  ,  fQ  (u)“I/q  (u)  is  an  estimate  of  the  density-quantile 
n  n  n 

function  by  virtue  of  the  reciprocal  identity.  The  g.o.f.  statistic 
of  Vasicek  uses  this  fact  to  define  the  sample  entropy  discussed  in 
the  next  section. 

A  problem  with  the  q  (u)  estimates  above  is  that  they  are  not 
n 

necessarily  consistent  estimators  of  q  (u) .  Hence,  one  usually  seeks 
"smoothed"  or  corrected  versions  that  yield  nice  asymptotic  results. 
One  notes  that  often  q (u)  is  not  a  function  of  interest  except  as  it 
is  applied  using  the  reciprocal  identity.  Thus,  techniques  for 
estimating  q (u)  are  employed  and  if  estimates  of  fQ(u)  are  desired, 
one  then  applies  the  reciprocal  identity.  The  estimation  of  fQ(u) 
will  be  considered  in  Chapter  3, 

The  bivariate  functions  of  interest  are  F  (x.y),  f  (x,y)  , 

A  ,  T  A  *  T 

D  (u  ,  u  )  ,  d(u  ,u  ),  and  Q  (ylx)  .  Raw  estimates  of  F  (x.y)  and 
'  2  12  Y | X  X.Y 

OtUi.u^)  may  be  obtained  analogously  to  the  empirical  c.d.f.,  i.e., 

defined  with  jumps  of  size  1/n  at  the  points  (X.,Y.)  and 

(Q./(n+1)  ,R./(n+l))  respectively,  where  Q.*rank(X.)  and  Rj»rank(Yj). 

Improved  versions  of  these  estimators  will  be  considered  in  Chapters  3 

and  4.  Parzen  (1977)  suggests  techniques  for  estimating  Qy|x(y|x) 

based  on  raw  estimates  of  D(uj,u,,).  This  subject  will  be  discussed 

further  in  Chapter  4  in  relation  to  several  techniques  of  bivariate 

density  estimation.  The  asymptotic  results  for  the  bivariate  case, 

however,  remain  to  be  investigated. 
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2.3  Information  and  Entropy 

The  concept  of  statistical  information  or  information  numbers  has 
many  useful  applications  in  statistical  analysis  (Kul Iback,  1978,  Rao, 
1973)*  Fisher's  information  has  been  studied  extensively  and  is  of 
primary  importance  in  uniform  minimum  variance  estimation  and  maximum 
likelihood  estimation.  We  will  consider  an  alternative  measure  of 
information  proposed  by  Shannon  (19^8)  and  studied  in  a  statistical 
setting  by  Kullback  (1978).  The  following  definitions  pertain  to  a 
measure  of  information  and  related  concepts. 

Definition  2.3.1  The  i nformation  I  (f;g)  between  two  densities 
f  (x)  and  g  (x)  is  given  by 

I  (f;g)  -  /  { 1  og [f  (x)  /g  (x)  ]  }  f  (x)  dx.  (2.3.1) 

—  00 

Pef ini t ion  2.3.2  The  entropy  of  a  density  f  (x)  is  given  by 

H  (f)  -  /  {-log  f  (x) }  f  (x)  dx.  (2.3.2) 

-00 

Def ini tion  2.3.3  The  cross-entropy  between  two  densities  f  (x)  and 
g  (x)  is  given  by 


H(f;g)  -  /"{-log  g(x)}f(x)  dx.  (2.3-3) 


One  immediately  notes  that  H(f)*H(f;f)  and  that 
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I  (f  sg)  -  H  (f ; g)  -  H  (f)  .  (2.3-6) 

Kullback  (1978)  proves  the  foliowing  fundamental  thec*em. 

Theorem  2.3.1  Let  f (x)  and  g (x)  be  probability  densities.  Then 

1  (f;g)  2  0  a.e.  (2.3-5) 

Equation  (2.3.5)  is  called  the  information  inequality.  We  wi 1 1 
exploit  this  inequality  in  constructing  tests  for  independence  between 
two  random  variables. 

Generally,  information  is  considered  as  a  "distance"  between  two 
densities,  although  it  is  not  a  metric  since  it  does  not  satisfy  the 
triangle  inequality.  If  one  wishes  a  symmetric  measure  of 
information,  one  such  definition  is  provided  by 

J  (f  ;g)  -  1  (f ; g)  +  1  (g;f) .  (2.3.6) 


Observe, 


J(f;g)  ■  J  [f(x)-g(x)]  log[*  (x)/g(x)]  dx.  (2.3-7) 

-00 

Kullback  calls  J(f;g)  the  d i veraence  between  f  and  g. 

One  may  also  note  that 


I  (fSfl)  -  Ef  [log  f  (X)]  -  Ef  [log  g(X)] 


(2.3-8) 
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and  that 


H(f)  -  £f[-log  f  (X)].  (2.3-9) 

These  expectations  need  not  be  finite. 

More  general  measures  of  information  may  also  be  developed. 

Parzen  (1982)  discusses  several  general  information  measures,  in 
particular  the  bi - i nformat i on  given  by 

ll(f;g)  **  /°°  I  log[f  (x) /g  (x)  ]  |  Jf  (x)  dx.  (2.3.10) 

-GO 

We  will  exploit  this  definition  as  an  estimation  criterion  in  Chapter 
4. 

Our  main  application  of  information  as  a  statistical  measure  will 
be  to  the  problem  of  ascertaining  whether  twc  random  variables  X  and  Y 
are  associated.  For  joint  p.d.f.  f  and  marginals  f  ,  f  ,  one 

A  y  T  AT 

obtains  (see  section  4.5) 

'  (fX  Y;fxV  “  'H(d)  (2.3.1D 

where  d  is  the  dependence  density  for  X  and  Y.  One  may  then  exploit 
(2.3*11)  as  a  measure  of  dependence  or  association.  Linfoot  (1957)  was 
one  of  the  first  authors  to  consider  information  as  a  measure  of 
association  between  X  and  Y.  Of  primary  importance,  however,  is  the 
fact  that  X  and  Y  are  independent  if  and  only  if  I  (f  v;fyfv)»0  by 
virtue  of  the  information  inequality  of  Theorem  2.3*1. 


I 
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One  may  desire  to  emphasize  the  modeling  of  probability  laws  of 
random  variables  by  using  the  alternate  notation 


I  (Y  j X)  -  I  {fY  i fy  | *  (2.3-12) 

Using  such  notation  it  is  easy  to  show  that 

I  (YjX)  *  H  (Y)  -  H  (Y  |  X)  (2.3-13) 


where 


H  (Y  |  X)  -  H(fy|x)  .  (2.3.  14) 

These  results  are  readily  applicable  to  regression  and  prediction 
problems . 

Information  thus  has  a  dual  role  in  statistics  being  a  parameter 
of  interest  or  a  criterion  function  depending  upon  the  setting  for  the 
problem  of  interest.  We  will  apply  information  criterion  functions  to 
the  problem  of  density  estimation  in  section  4.4.  The  use  of 
information  as  a  dependence  parameter  will  be  investigated  in  section 
4.5.  Generally,  information  serves  as  a  useful  goodness-of-f i t 
criterion  also.  Consider  the  general  Neyman-Pearson  theory  of 
hypothesis  testing.  Recall,  one  rejects  HQ:  f(x)*fQ(x)  in  favor  of 
H^:  f(x)«fj(x)  for  specified  a  if 

S  f.(X.)  2  k  51  f  (X.),  (2.3.15) 

i-1  11  1-1  0  ' 


j 

i 

A 

i 

D 

0 

fl 

(1 

0 

0 

§ 
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where  k  is  chosen  to  satisfy 


P{[  5  f  ,  (X.)]/[  n  f  (X.)]2k|H  }  -  a  • 

1-1  1  ‘  i-1  °  1 

Taking  logarithms  of  (2.3*15)  and  simplifying,  one  obtains 
equivalent  expression 


(1/n)  l  log  f  (X. )  -  (1/n)  T  log  f  (X.)  i»  k  , 

u  II  “  o  I 


i=l 


i=l 


which  can  be  written 


Hp  (f,)  -  H p  (fo)  2  k  , 


where 


Hp  (f)  -  /  -log  f  (x)  dFn  (x)  . 
n 

Another  expression  equivalent  to  (2.3. 15)  is 


1  p  (f,;f  )  ^  k 

F„  l  o 


where 


00 

Ip  (f  so)  -  /  logCf  (x)/g(x)]  dF  (x)  . 
n  -«o  n 


(2.3.16) 

the 

(2-3.17) 

(2.3.18) 

(2.3.19) 

(2.3.20) 

(2.3.21) 


Vasicek  (1976)  develops  an  entropy  based  test  of  normality  with 


34 


critical  region  defined  by 


"„(,V  s  V"’"> 


(2-3-22) 


where  H^fQ^) 


given  by 


~  n 

H(fl^)  -  (1/n)  l  )og{(n/2m)  <x(  i+m)  _x(  i -m) } } .  (2.3-23) 

with  for  i<1  anci  X(i)“X(n)  for  i>n*  is  the  samP,e  *ntropy 

of  a  nearest  neighbor  estimate  of  the  density-quantile  function  and 
Ha(m,n)  is  the  corresponding  critical  value  for  significance  level  a. 
Oudewicz  and  Van  Oer  Meulen  0981}  investigate  power  properties  of 
this  procedure  and  conclude  in  simulation  studies  that  the  test  is 
compet  tive  with  existing  g.o.f.  procedures.  Vasicek  shows  that  the 
sample  entropy  is  consistent  for  estimating  H  (f ) .  A  similar  procedure 
for  the  bivariate  case  will  be  considered  in  Chapter  4. 

Since  the  bivariate  normal  is  of  special  interest,  we  note  that 
the  information  between  the  joint  p.d.f.  and  the  product  of  the 
marginals  for  this  special  case  is  given  by 

,{fXY’*fxV  "  ',5  ,0*(,’P2)  (2.3.24) 

so  that  the  information  in  this  case  is  a  function  only  of  the 
correlation  coefficient  p. 

The  parametric  approach  to  statistical  inference  using 


information  theory,  with  eaiphasis  on  classical  normal  theory,  has  been 
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studied  extensively  in  the  literature  with  Kullback  (1978)  providing  a 
fundamental  reference.  Only  recently  has  information  theory  been 
applied  to  nonparametr i c  problems  and  exploratory  data  analysis.  This 
work  attempts  to  contribute  to  the  application  of  information  theory 
to  such  statistical  problems. 

2.1»  Some  Fundamental  Concepts  from  Approximation  Theory 

Approximation  theory  has  as  its  primary  goal  the  approximation  of 
a  function  (or  a  graph  or  a  curve).  Several  examples  may  illuminate 
the  need  for  such  an  approximation. 

Example  2.4.  1  The  error  function  defined  by 

erf  (x)  *  (2Mt)  /  exp(-y’)  dy  (2.4.1) 

cannot  be  obtained  explicitly  for  specified  x  since  the  integral  on 
the  right  hand  side  of  (2.4.1)  cannot  be  simplified.  Hence,  one  seeks 
to  approximate  erf  (x)  by  approximating  the  integral  for  given  x.  One 
solution  is  to  employ  numerical  integration  techniques  to  approximate 
erf  (x) .  Statisticians  are  interested  in  this  problem  because  the 
standard  normal  c.d.f.  $(x)  may  be  expressed  by 


4>(x)  ■  .5  +  .5  erf(x//2),  x>0. 


(2.4.2) 
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parameters  0^* (  9], ... ,0^)  and  {e. }  are  i.i.d.  r.v.'s  with  known 
distribution.  The  function  r(x,_9)  is  known  except  for  the 
parameters.  One  seeks  to  approximate  r  by  estimating  the  parameters 
based  on  the  sample  data. 

Example  2.  A.  A  A  set  of  data  is  generated  by  a  probability 
mechani sm  wi th  probability  density  function  f.  If  f  is  unknown,  one 
seeks  to  approximate  f  based  on  the  observations  in  the  sample  data. 
Chapter  3  presents  several  solutions  to  this  problem. 

Example  2. A. 3  illustrates  how  approximation  theory  and 
statistical  estimation  theory  compliment  each  other.  However,  Example 
2.A.1*  seems  to  be  an  exercise  in  statistical  estimation  alone.  One 
may  find  it  difficult  to  distinguish  between  the  terms  "approximation" 
and  "estimation".  In  this  work,  approximation  theory  will  refer  to 
the  concepts  and  theorems  employed  to  approximate  a  function  with 
known  mathematical  properties.  The  approximation  may  be  obtained  by 
rigorous  mathematical  arguments  or  by  working  with  criterion  functions 
and  sample  data.  Estimation  theory,  on  the  other  hand,  must  always 
resort  to  sample  data  and  hence  h?s  a  stochastic  element  not  essential 
to  approximation  theory.  In  the  context  of  Example  2.U.4,  estimation 
theory  would  treat  f (x)  as  a  "parameter"  while  approx imat ion  theory 
would  treat  it  as  a  function  with  specified  properties.  Any  ambiguity 
in  the  use  of  these  terms  should  pose  no  serious  obstacles  in  applying 
them  to  problems  of  interest. 

To  approximate  a  function  one  usually  must  restrict  f  to  belong 


i 


to  a  certain  class  of  functions.  Theory  is  then  developed  to 
approximate  functions  in  a  given  class.  More  generally,  one  may 
consider  a  space  of  functions  whose  members  possess  certain 
properties.  The  simplest  class  of  functions  might  be  the  space  of 
constant  functions,  while  one  of  the  more  complex  classes  might 
consist  of  measurable  functions.  To  begin  the  study  of  approx imat i ng 
elements  in  a  space  of  functions,  several  concepts  will  be  introduced 
chat  will  be  useful  in  defining  function  spaces. 


Def ini tion  2.4. 1  A  metr i c  space  (M,d)  is  a  nonempty  set  M  of 
elements  together  with  a  real-valued  function  d:rtxrt-*fl  called  a  metr i c 
that  satisfies  the  following  properties  for  all  x.y.zert: 


i)  d(x,y)iO; 
i)  d  (x,y)“0  iff  x-y; 
i)  d  (x,y)  -d  (y  ,x)  *  and 
v)  d  (x,y)  Sd  (x,z)+d  (z.y)  . 


The  function  d  is  also  called  a  distance,  and  property  (iv)  above  is 
called  the  tr ianole  i neoual i ty  because  of  its  application  to  Euclidean 
2-space  with  the  metric  d  (x,  y) -I  x-y  I  . 


'inition  2.4.2  A  vector  scnce  (or  1  inoar  space)  over  the  reals 


is  a  sat  of  elements  (called  vectors)  V  together  with  two  operations 
(functions)  +:VxV-*V  and  e:RxV-*V  which  satisfy  the  following  properties 
fer  all  x.y.xeV  and  A.ueft: 


.  %■ .....  •  *  ■  - . 


I 

I 
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i )  x+y-y+x ; 

ii)  (x+y) +z«x+ (y+z) ; 

iii)  there  exists  9eV  such  that  x+6-x  for  all  xeV; 

iv)  A(x+y)*  x+Ay; 

v)  (A+y) x**Ax+yx; 

vi)  A(yx)*(Ay)x;  and 

vi i)  0#x«  ,  lex-x. 

The  set  of  real  numbers  in  this  setting  is  also  called  a  set  of 
scalars,  hence  in  general  one  speaks  of  a  vector  space  and  a  set  of 
scalars . 

Def  i  ni  t  ion  2  .U.  3  A  real-valued  function  ||*||:V-*R  defined  on  a 
vector  space  V  is  called  a  norm  if  the  following  properties  are 
satisfied  for  all  x.yeV  and  AeR: 

i)  ||x||20; 

ii)  ||x||«0  iff  X-0{ 

iii)  ||x+y|lsIMI+IWl!  and 

iv)  || Ax ||* |  A|||x  ||. 

A  vector  space  that  possesses  such  a  norm  is  called  appropr iately 
enough  a  normed  vector  space. 

One  immediately  notes  that  d  (x,y)*||x-y  ||  defines  a  metric  and 
hence  a  normed  vector  space  is  also  a  metric  space.  The  concept  of  a 
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distance  measure  or  metric  gives  one  a  firm  grasp  on  many  abstract 
concepts. 

Definition  2.4.4  Let  {xn)  be  a  sequence  of  vectors  in  a  normed 
vector  space  V  with  norm  ||*||.  The  sequence  {xn}  is  called  a  Cauchy 
sequence  if  for  given  e>0,  there  exists  an  N  such  that  for  all  nJtN  and 

m2N*  llxn"^l<£' 

Def ini tion  2.4.6  A  normed  vector  space  V  is  said  to  be  complete 

if  all  Cauchy  sequences  converge,  i.e.,  if  {xR}  is  a  Cauchy  sequence, 

then  there  exists  xeV  such  that  I im  x  *x.  A  complete  normed  vector 

n 

n-*» 

space  is  called  a  Banach  space. 

Def i n i t i on  2.4.6  A  set  H  is  called  an  inner  product  space  if  it 
is  a  vector  space  and  if  there  exists  a  real-valued  function 
(  ,  ) :HxH  R  called  an  inner  product  that  satisfies  the  following 
properties  for  all  x,y,2  H  and  all  XeR: 

i)  (Xx,y)»  A(x,y)  ; 
i  i )  (x+y ,  x)  •  (x ,  z)  +  (y ,  z)  l 
i  i  i)  (x,y)»(y,x)  ;  and 
iv)  (x , x) >0  if  x#9. 

One  may  permit  the  inner  product  to  be  complex  valued,  i.e., 

( .,  •)  :HxH-*C,  in  which  case  property  (iii)  above  becomes 


i  i  i )  (x ,  y)  -1 yTxT 


where  2  is  the  complex  conjugate  of  2.  Note  that  jjx||l"(x,x)  defines  a 
norm  so  that  an  inner  product  space  is  also  a  normed  vector  space. 

Def i ni t i on  2 .  L .  7  A  complete  inner  product  space  is  called  a 
H i lbert  space.  (Some  authors,  e.g.,  Davis,  1975.  give  a  more 
restrictive  definition  of  a  Hilbert  space,  but  this  definition  seems 
fairly  standard  in  the  literature.  See,  e.g.,  Lo&ve,  1977,  Royden, 

1968,  or  Hewitt  and  Stromberg,  1 965 • ) 

When  one  moves  down  the  hierarchy  of  spaces  defined  above,  the 
transition  from  inner  product  to  norm  to  metric  will  be  understood  to 
follow  the  convention  given  unless  otherwise  noted.  The  hierarchy  is 
emphasi2ed  as  follows: 

Hilbert  space  Banach  space  ->  normed  vector  space  ■+  metric  space. 

While  restricting  functions  to  belong  to  a  Hilbert  space  may  seem 
severe,  one  will  soon  discover  that  many  functions  of  statistical 
importance  fit  nicely  into  a  special  class  of  Hilbert  spaces  called 

a 

the  L  spaces. 

Theorem  2.1*.  1  Let  H  be  a  Hilbert  space  with  inner  product  (*,•). 


(x.y)  *  || x||  || y || . 


(2.4.5) 


The  inequality  (2.4.5)  is  known  as  the  Cauchy-Schwarz  i negual i tv 


Example  2.4.5  For  two  random  variables  X  and  Y  possessing  finite 
second  moments,  Cov(X.Y)  is  an  inner  product.  Thus,  /Var  (x) •Y'Cov  (X , X) 
is  a  norm,  and  hence 


Cov(X.Y)  < 


(2.4.6) 


This  is  the  common  statistical  version  of  the  Cauchy-Schwarz 
inequal i ty. 


Def ini tion  2.4.8  Let  H  be  a  Hilbert  space  with  inner  product 
(-,•)  and  let  x.yeH.  One  says  that  x  and  y  are  orthogonal  if 
(x.y)*0.  A  set  SeH  is  called  an  orthogonal  system  if  for  all  distinct 
elements  x.ycS,  (x,y)*0.  Furthermore,  if  x  *1  for  all  x^S,  S  is 
called  an  orthonormal  system. 


Remark  2.4.1  If  H  is  separable  (see,  e.g.,  Royden,  1968),  then 
every  orthonormal  system  in  H  must  be  countable.  This  work  will  need 
only  consider  separable  Hilbert  spaces.  Theorem  2.4.3  below  will 
i I lustrate  why. 


Definition  2.4.9  Let  H  be  a  separable  Hilbert  space  and  let 
^*k^k«l  an  system  in  H.  The  Fourier  coefficients 


.  .  -  *** 


‘TV;,  ^^'^4 
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w.r.t.  {4>^ }  of  an  element  x£H  are  defined  by 

6k  “  <x»^k)  '  (2.k. 7) 

30 

Theorem  2.1*. 2  Let  H  be  a  separable  Hilbert  space  and  be 

00 

the  Fourier  coefficients  w.r.t.  of  an  eluent  x£H.  Then 

00 

I  e*s||x||*.  (2.1*. 8) 

k-i 

This  is  known  as  Bessel 1 s  i neaua 1 i ty . 

Deini tion  2.1*. 10  Let  H  be  a  separable  Hilbert  space  and  let 

CO 

{^k?  k=“l  b*  80  orthonormal  system  in  H.  If  (x,*|<)-0  for  all  k  implies 

oo 

that  x*0  (here  9  is  the  identity  element),  then  k„|  is  said  to 
be  8  complete  orthonormal  system. 

The  justification  for  the  term  "complete"  becomes  evident  in  the 
following  theorem.  Royden  (1968,  p.  212)  also  gives  motivation  for 
this  usage. 

Theorem  2.1*.  3  Let  H  be  a  separable  Hilbert  space.  Then  every 

orthonormal  system  in  H  is  countable  and  there  exists  a  complete 

00 

orthonormal  system.  If  {$(<}  is  any  complete  orthonormal  system  in 


H  and  xcH,  then 


00 


4  u 

x  •  J  0.  i  (2.4.9) 

i  i  <  k 
k=  1 

where  9^»(x,4>^).  Equation  (2.4.9)  is  said  to  be  the  Four  i er  ser  i es 

m 

representation  of  x  and  specifically  means  1  im||x-  £  0^it>J|*O. 

m-«°  k=l 

Remark  2.4.2  Every  complete  orthonormal  system  in  a  separable 
Hilbert  space  has  the  same  number  of  elements.  This  number  is  called 
the  dimension  of  H. 

Remark  2.4.1  If  the  dimension  of  H  is  finite,  then  H  is  a  finite 
dimensional  vector  space  and  any  complete  orthonormal  system  in  H  is  a 
basis  for  H.  One  may  also  consider  countably  infinite  basis  sets  if 
desired,  in  which  case  any  complete  orthonormal  system  in  H  is  a  basis 
for  H.  The  Gramm-Schmi dt  or thonorma 1 i zat i on  process  (see,  e.g., 

Hewitt  and  Stromberg,  19&5,  pp>  240-242)  permits  construction  of  an 
orthonormal  system  given  any  basis  set  for  a  vector  space. 

Remark  2.4.4  In  the  context  of  Theorem  2.4.3,  it  follows  that  for 

xcH, 


This  is  known  as  Parseval  ' s  ideni.  i  ty. 


(2.4.10) 


Theorem  2.4.3  provides  the  foundation  for  many  useful  expansion 
techniques  used  to  approximate  a  function  of  interest.  One  need  only 
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decide  upon  an  appropriate  orthonormal  system  in  a  Hilbert  space  of 
functions  to  construct  an  approximation  based  upon  equation  (2.4. 9). 
First,  however,  an  appropriate  Hilbert  space  must  be  identified  that 


•O 


contains  functions  of  interest. 


Def i ni t ion  2.4.11  A  (Lebesgue)  measurable  function  f  is  said  to 
belong  to  the  space  Lp  (a,b)  if 


00 


(2.4.11) 


for  1Sp<  . 

If  one  defines  an  inner  product  between  two  functions  f  and  g  by 

b  _ 

(f,g)  -  J  f  (x)  “gTxT  dx,  (2.4.12) 

a 

where  g (x)  is  the  complex  conjugate  of  g  (x) ,  then  the  corresponding 
norm  is  given  by 


II' 


(2.4.13) 


It  can  be  shown  that  LMa.b)  is  a  separable  Hilbert  space,  and  hence 

Theorem  2.4.3  applies  for  functions  in  LJ  (a,b) .  Let  {d>,  (x)}T  ,  be  a 

k  k=l 

complete  orthonormal  system  in  L*  (a,b) .  Then  for  feL1 (a,b) ,  equation 
(2.4.9)  becomes 
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f(x)  -  I  ek<*>k(x)  (2.4. 1M 

k*l 


where 


b 

9k  "  <f,V  ”  I  f  M  4>k(x>  dx  (2.4.15) 

are  the  Fourier  coefficients  w.r.t.  {  4>k(x) } . 

Now,  suppose  one  wishes  to  approximate  a  function  f£L*(a,b)  by  a 
suitable  finite  expression.  One  solution  is  to  chose  the  truncated 
Fourier  series  representation 

fm(x)  “  I  W  M  •  (2.4.16) 

k»l 

Indeed,  this  approximation  for  f  has  some  nice  properties. 

Theorem  2.4.4  Let  H  be  a  separable  Hilbert  space  and  let  {4^}^ 
be  a  complete  orthonormal  system  in  H.  Then  for  any  xeH, 

ll*- 1  <*.VMSHX-  i  6k\H  (2*1k,7) 

k»l  k=l 

for  any  choice  of  constants  9,,...,0  . 

1  m 

Observe  that  Theorem  2.4.4  implies  that  the  best  approximator  w.r.t. 
the  least  squares  criterion  for  an  element  in  a  separable  Hilbert 
space  is  provided  by  the  truncated  Fourier  series  representation. 

If  one  seeks  a  geometric  interpretation  of  least  squares 


e 

ii 

I! 

II 

r 


approximation,  some  fundamental  definitions  and  notation  are  required. 
Since  a  separable  Hilbert  space  is  also  a  vector  space,  one  may  employ 
rn  analogy  to  vectors  in  Euclidean  space  and  define  for  any  two 
elements  x,y  in  a  Hilbert  space  H  the  projection  of  x  on  y  by 

proj  (x.y)  -  [(x.y)/(y,y)]y.  (2.4.18) 

Using  this  interpretation,  one  sees  that  an  element  of  a  separable 
Hilbert  space  may  be  expressed  as  the  sum  of  the  projections  of  the 
element  on  the  elements  of  an  orthonormal  system.  Furthermore,  one 

CO 

observes  that  the  orthonormal  system  {4^}^  defines  a  subspace  of 

the  separable  Hilbert  space  H.  In  this  sense,  the  truncated  Fourier 

series  representat ion  for  an  element  x  in  H  is  essentially  the 

projection  of  x  into  an  m-dimensiona!  subspace  of  H.  For  a  clearer 

exposition  on  the  geometric  interpretation  of  least  squares 

estimation,  see  Chapter  8  of  Davis  ( 1 975) - 

Many  of  the  results  stated  above  also  hold  if  ,  is  merely 

k  k=*l 

a  system  of  orthogonal  elements,  except  that  terms  involving  ||<t>k||  may 
have  been  omitted.  (Recall,  in  the  setting  considered  These 

results  are  used  extensively  in  the  statistical  literature,  most 
notably  in  the  study  of  linear  models.  However,  in  many  statistical 
settings,  finite-dimensional  vector  space  theory  is  sufficient  to 
handle  most  problems  of  interest. 

Many  fundamental  analysis  texts  discuss  orthonormal  systems  for 
the  space  l*(a,b)  as  well  as  for  other  spaces  of  functions.  The  most 
popular  systems  include  Jacobi  polynomials,  trigonometric  systems,  and 
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complex  exponentials.  lanczos  (1958),  Rainville  (I960),  Davis  (1975), 
and  Powell  ( 1 98 1 )  are  some  basic  references  on  approximation  theory 
that  identify  various  useful  orthonormal  systems. 

For  a  discussion  of  some  basic  integration  theorems  and  other 
results  for  LP  spaces,  one  may  consult  basic  texts  such  as  Royden 
(1968)  or  Hewitt  and  Stromberg  ( 1 965) * 

Most  of  the  discussion  thus  far  has  emphasized  only 
approximations  by  orthogonal  expansion.  Some  results  found  in  Bochner 
(1955)  *r*  valuable  for  other  types  of  approximation.  Parzen  ( 1 962) 
contains  some  useful  essentials  of  approximation  theory  relevant  to 
kernel  density  estimation  based  on  some  of  the  elements  of 
approximation  theory  discussed  by  Bochner. 

Theorem  2.4,5  Let  K (x)  be  a  Borel  measurable  function  satisfying 

i)  sup  |  K  (x)  |  «  oo  ; 

00 

i  i)  /  |K  (x)  |  dx  <  00;  and 
•00 

i  i  i )  I  i  m  |  x  K  (x)  |  ■  0 . 
x-*» 

Lat  f  (x)  satisfy 

OO 

iv)  /  |  f  (x)  |  dx  <  «> . 


Let  {h(n)}  be  a  sequence  of  positive  constants  satisfying 
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v)  h  (n)-*-0  as  it*”  . 

Define  approximating  functions  f  (x)  by 

f  (x)  -  /  [1/ h  (n)  jK[y/h  (n)]f  (x-y)  dy.  (2.4.  19) 

n  4 

-00 

Then  at  every  continuity  point  x  of  f, 

q.m. 

f  (x)  — rf  (x)  /" K  (y)  dy  as  n-w.  (2.4.20) 

n  4 

—  ao 

Result  (2.4.20)  illustrates  why  one  usually  makes  the  additional 
restriction  that  K (x)  integrates  to  one.  One  calls  K  a  kernel 
function,  and  specific  suggestions  for  K  may  be  found  in  Bochner 
0955)  or  Parzen  (1962)  . 

Parzen  considers  analogous  results  for  the  Fourier  transforms 

k  (u)  ■  j°°  exp(-iux)  K  (x)  dx  (2.4.21) 

—  00 

and 

<P(u)  ■  /  exp(iux)f(x)  dx,  (2.4.22) 

-00 

and  extends  these  results  to  the  solution  of  problems  of  density 
estimation.  Some  of  these  results  will  be  mentioned  in  section  3.3. 

Other  results  in  approximation  theory  are  applicable  to 
statistics.  As  suggested,  a  reference  such  as  Abramowitz  and  Stegun 
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(1972)  is  very  handy  for  computer  implementation  of  approximation 
theory  results,  and  most  such  references  require  little  mathematical 
expertise.  We  have  avoided  discussion  of  spline  approximation 
techniques  as  we  will  continue  to  do  throughout  this  work,  but  useful 
references  such  as  Ahlberg,  Nilson,  and  Walsh  ( 1 967)  and  Wahba  (1971) 
adequately  discuss  the  topic. 

2.5  Some  Fundamental  Stochastic  Processes  and  Complex  Regression 

The  standard  linear  regression  model  is  usually  written 

P 

Y.  *  60  +  l  8.  X.  .  +  e.  ,  i-l . .  (2.5. 1) 

1  k-1  *  ki  • 

where  observations  on  the  vector  X*(X,,...,X  )  are  assumed  to  be 

—  1  P 

measured  without  error  and  the  e.  are  uncorrelated  random  variables 
with  common  mean  zero  and  common  positive  finite  variance  a*.  In 
matrix  notation,  one  writes 


Y  -  XS  +  e  , 


or  one  may  express  (2.5.1)  by 


E  (Y|X-x)  •  60  +  l  6  *  » 

k-1  k  k 


Var(Y)X-x)  -  a*. 


(2.5.2) 


(2.5.3) 


5> 


The  well  known  Gauss-^arkov  Theorem  (see,  e.g.,  Graybill,  1976)  states 
that  under  these  equivalent  model  specifications,  the  least  squares 
estimate  of  0,  namely 


£  -  (X 1  X)  '  lX '  Y ,  (2.5.1*) 

is  the  uniform  minimum  variance  linear  unbiased  estimator  (BLUE)  of 
Mote  that  for  conditions  (2.5-3)  one  must  also  specify  that 
observations  are  obtained  from  a  random  sample  to  insure  that  the  v 
values  are  uncorrelated.  When  one  assumes  a  Gaussian  model,  i.e., 
with  normally  distributed  error  term,  the  least  squares  estimator  6_  is 
also  a  maximum  likelihood  estimator  and  assumes  the  additional 
property  of  of  being  a  uniform  minimum  variance  unbiased  estimator 
(UflVUE) .  Graybill  ( 1 976)  summarizes  the  relevant  statistical  facts 
about  the  linear  regression  model  and  considers  the  general  linear 
regression  model  allowing  correlated  error  terms  with  specified 
covariance  matrix  I.  The  genera!  least  squares  (GLS)  estimate  of  §_  is 

£  -  (X '  E~ lX)  * *X ' I* lY  (2.5-5) 


when  Z  is  known.  When  the  covariance  matrix  is  unknown,  it  must  then 
be  estimated  to  obtain  an  estimate  for  §_.  Graybill  ( 1 976)  discusses 
conditions  that  the  covariance  matrix  must  satisfy  to  insure  that  the 
ordinary  least  squares  (OLS)  estimator  given  by  (2 . 5 • ^)  remains  a 
UrtVUE  under  the  more  general  setting.  Estimation  of  the  covariance 
matrix  poses  some  serious  problems  to  obtaining  statistical  properties 
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for  GLS  estimators  of  the  coefficient  vector.  Just  such  a  problem 
occurs  in  the  density  estimation  procedure  discussed  in  section  4.3. 
Unfortunately,  when  the  properties  of  the  error  vector  are  only 
approximately  known,  one  must  seek  heuristic  solutions  to  the 
estimation  problem. 

Parzen  ( 1 96 1 )  considers  a  general  setting  applicable  to  time 
series  analysis.  Essentially,  (2.5*1)  represents  a  discrete  parameter 
stochastic  process.  The  continuous  parameter  analog  may  be  written 

Y(t)  -  m(t)  +  2  (t)  ,  teT,  (2.5*6) 

where  m(t)  is  a  mean  value  function  and  Z(t)  is  a  stochastic  process 
with  specified  properties.  One  usually  assumes  that  Z(t)  has  zero 
mean  and  covariance  kernel 

K(s,t)  -  E  [Z  (s)  Z  (t)  ]  .  (2.5*7) 

Furthermore,  as  in  the  linear  regression  model,  one  assumes  that  Y  (t) 
is  observable  while  Z(t)  is  not.  The  following  development  mirrors 
Parzen  ( 1 96 1 ) . 

Def i ni t ion  2.5.1  A  Hilbert  space  H  with  inner  product  (*,*)  is 
said  to  be  a  Reproducing  Kernel  H i Ibert  Space  (RKHS)  with  reproducing 
kernel  K  if  members  of  H  are  functions  defined  on  a  set  T  and  K  is  a 
function  on  TxT  with  the  following  properties  for  every  t  in  T: 


i)  K(',t)  as  a  function  of  t  is  in  H,  and 

ii)  (g.K(-.t))  *  g(t)  for  every  g  in  H. 
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Two  special  stochastic  processes  are  of  interest  in  establishing  a 
parametric  theory  for  continuous  parameter  regression  analysis. 

Def  ini  tion  2.?. 2  A  stochastic  process  {X  (t)  ,  teCO.00)}  is  said  to 
be  a  Weiner  process  if  X  (0) “0  and 

i)  (X  (t)  ,  te[0,°“)}  has  stationary  i ndependent  increments: 
ii)  for  each  t>0,  X  (t)  is  normally  distributed;  and 

iii)  for  all  t>0,  E[X(t)3"0. 

Note  that  knowledge  of  the  variance  of  the  Weiner  process  completely 
characterizes  its  probability  law.  For  OSsSt,  one  observes  that 

Var  [X  (t) -X  (s)  ]  ■  a2  (t-s)  .  (2.5.8) 

A  Weiner  process  is  a  special  case  of  a  normal  process.  A  normal 
process  requires  that  all  finite  dimensional  distributions  be  jointly 
normal.  Another  special  case  of  a  normal  process  is  given  by  the 
following  definition. 

Def i ni t ion  2 .5. 3  A  stochastic  process  (B(t),  te[0,l]}  is  called  a 


Brownian  Br i doe  process  if  it  is  a  normal  process  with  zero  mean  value 
function  and  covariance  kernel 
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K(s,t)  ■  mi  n  (s,  t)  -  st.  (2.5.9) 

An  important  regression  model  applicable  to  the  density  estimation 
approach  of  section  4.3  is  given  by 

m 

Y  (t)  9k<f>k  (t)  +  B(t),  OStSl.  (2.5.10) 

k=i 

where  {<}> k ( t)  ,  k“l,m}  is  a  complete  system  of  orthogonal  functions  in  a 
finite  dimensional  subspace  of  LM0.1)  and  {B  (t)  ,  OStSl)  is  a  Brownian 
Bridge  process.  Eubank  (1979)  discusses  optimal  designs  for 
estimating  the  {0k<  k-l,m}  based  on  a  finite  grid  of  points  in  [0,1]. 
One  may  not  have  the  option  to  design  an  experiment  to  take  advantage 
of  these  results,  however.  We  shall  not  consider  the  problem  of 
optimal  designs  for  such  models. 

Experience  indicates  that  the  OLS  estimation  techniques  often 
compare  favorably  in  comparison  with  GLS  methods  based  on  estimating 
an  unknown  covariance  matrix.  However,  the  results  are  not 
satisfactory  when  the  discrete  covariance  structure  does  not  have 
vanishing  off  diagonal  elements  as  one  moves  away  from  the  diagonal. 
The  applications  discussed  in  later  chapters  avoid  such  situations. 

For  the  model  (2.5*10),  Parzen  (1961)  shows  that  the  maximum 
likelihood  estimates  of  the  parameters  are  given  by 

\  "  (Y’V  •  J"1  (2.5.1D 

where  (*, ’)  is  the  inner  product  of  the  Hilbert  function  space 
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generated  by  the  reproducing  kernel  K(s,t)  of  the  Brownian  Bridge 
process  given  in  (2.5*9) •  for  nonlinear  mean  value  functions  or  for 
infinite  expansions  utilizing  a  countable  system  of  basis  functions, 
the  problem  is  more  complicated.  Desirable  properties  for  the 
estimates  are  stated  in  Parzen  (1961),  and  references  are  given  to 
proofs  from  other  sources.  Specific  representations  for  the  estimates 
will  be  given  in  later  chapters  under  nonparametr i c  settings  without 
exploiting  the  reproducing  kernel  property.  Such  generalizations  may 
be  desired  but  will  be  left  to  the  more  mathematically  sophisticated 
researcher . 

In  some  situations,  the  orthonormal  system  C^,  k-l,m}  will  be 
composed  of  complex  valued  functions.  Bril  linger  (1975)  discusses 
some  generalizations  of  least  squares  theory  to  handle  this  situation. 
A  complex  matrix  H  is  said  to  be  Hermi ti an  if  the  transpose  of  H  is 
equal  to  the  conjugate  of  H.  This  extends  the  notion  of  symmetry  for 
real  matrices.  One  writes  H  is  Hermi ti an  if  H'-H.  The  definition  of 
nonnegative  definiteness  readily  extends  to  complex  systems.  A  matrix 
H  is  said  to  be  nonnegative  definite  if 

m  m 

l  I  *.a  H  2  0  (2.5.12) 

j-1  k-1  J  k  JK 

for  all  complex  constants  a a  ,  where  H-(H.,)  is  an  m  by  m 

1  m  jk 

complex  matrix.  The  matrices  H'H  and  HH'  are  always  Hermitian  and 
nonnegative  definite. 

Now,  consider  the  complex  regression  model 
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Y  «  H6  +  e 


(3.»*.13) 


where  Y  is  an  nxl  observable  random  vector,  the  elements  of  the  nxp 
matrix  H  are  measured  without  error,  B_  is  a  pxl  vector  of  parameters, 
and  £  is  an  unobservable  nxl  vector  of  zero  mean  uncorrelated  random 
variables  with  finite  variance  o1.  Then 


(Y-H0)  '  (Y-HB) 


(2.5. 1M 


is  minimized  over  all  choices  of  8  when  8  is  estimated  by 


8  -  (H'H) -*H'Y 


(2.5.15) 


i 


for  nonsingular  H'H.  Equation  (2.5-15)  represents  the  least  squares  U 

estimates  of  the  parameters  in  (2.5.13).  When  the  corresponding  | 

elements  are  real,  this  reduces  to  the  usual  least  squares  formula. 

Observe  that  Y^  and  £  may  be  real  random  vectors  and  still  support  j 

complex  parameters  and  design  matrix,  the  only  restriction  being  that 

complex  components  of  the  product  must  vanish.  fj 
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3.  A  SURVEY  OF  NONPARAMETRIC  DENSITY  ESTIMATION 

3.1  Introduction 

In  some  general  cases,  parameter  estimation  is  a  form  of 
parametric  density  estimation.  Estimating  the  mean  of  an  exponential 
distribution  or  the  mean  and  variance  of  a  normal  distribution 
provides  an  estimate  of  the  density  for  that  particular  distribution. 
Many  goodness-of -f i t  procedures  combine  parametric  and  nonparametr i c 
density  estimation  procedures  to  arrive  at  a  test  statistic  for  a 
specified  null  distr ibution.  The  applications  of  density  estimation, 
however,  extend  to  many  areas  of  statistics.  Many  parameters  of 
interest  are  functionals  involving  the  parent  density  of  a  data  set, 
so  estimating  a  density  can  lead  indirectly  to  estimating  a  parameter 
of  interest.  For  example,  as  mentioned  in  section  2.2,  the  satistic  Xn 
i s  often  wr i tten 

_  n 

Xn  -  /"x  dFn(x)  -  (1/n)  l  X.,  (3.1.1) 

-00  j  m  | 

where  F^tx)  •*  the  empirical  c.d.f.  based  on  a  random  sample  of  size 

n,  thus  emphasizing  that  X  i s  an  estimator  of  the  parameter 

n 

u-  f"xdF(x).  (3.1.2) 

•Oo 

Writing  the  above  as  a  Riemann  integral,  one  may  wish  to  form 


estimates 
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u  ■  fx  f  (x)  dx  (3  • '  •  3) 

‘  n 

—  as 

where  the  integration  may  be  performed  numerically.  The  common 
grouped  data  formula  for  given  in  many  elementary  statistics  texts 
is  actually  an  integral  of  a  histogram  for  the  data  set.  Thus, 
nonparametr ic  density  estimation  provides  a  basis  for  attacking  many 
statistical  problems  from  a  nonparametr i c  viewpoint. 

Silverman  (-980)  observes  that  density  estimation  is  of 
fundamental  importance  in  exploratory  data  analysis  and  has  many 
applications  in  confirmatory  analysis.  Silverman  notes,  however,  that 
"...density  estimation  cannot  be  used  as  a  'back  of  an  envelope1 
exploratory  technique..."  like  many  of  the  techniques  of  Tukey  0977). 
but  he  does  not  see  this  as  a  disadvantage.  In  fact,  the  current 
state  of  computer  technology  and  the  availability  of  sophisticated 
statistical  sof  :ware  should  make  one  question  any  serious  exploratory 
analysis  that  does  not  include  some  form  of  nonparametr ic  density 
estimation. 

Bean  and  Tsokos  0980) ,  Tapia  and  Thompson  ( 1 97*) .  Tartar  and 
Kronmal  0978),  Carmichael  0976).  and  Wertz  0980)  provide  useful 
bibliographic  and  expository  information  regarding  the  nonparametr i c 
estimation  of  densities  and  related  functions.  However,  the  abundance 
of  literature  on  the  subject  should  not  disguise  the  fact  that  the 
area  of  nonparametr ic  density  estimation  is  rich  with  unsolved 
problems.  The  fundamental  weakness  of  most  procedures  is  the 
subjectivity  required  in  choosing  a  "smoothness  parameter".  Some 
objective  techniques  for  handling  this  problem  have  been  suggested, 
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but  in  general  the  problem  presents  a  serious  obstacle  to  the 
applicability  of  most  density  estimation  methods. 

This  chapter  will  deal  with  the  major  classifications  of 
nonparametr i c  density  estimation  techniques,  giving  details  of  some  of 
the  more  popular  procedures.  Comparisons  of  some  of  the  procedures 
will  be  made  in  Chapter  6,  but  no  Monte  Carlo  study  has  been  attempted 
because  of  the  difficulty  in  handling  the  subjective  smoothing 
requirements.  Anderson  (1969.  es  referenced  in  Bean  and  Tsokos,  1980) 
and  Scott  and  Factor  ( 1 98 1 )  consider  such  studies  for  a  restricted  set 
of  estimators,  but  their  findings  are  somewhat  inconclusive  ;n  terms 
of  the  general  area  of  nonparametr i c  density  estimation. 

Many  of  the  techniques  discussed  have  multivariate  extensions. 
These  will  be  mentioned  or  referenced  ,  but  attention  to  bivariate 
density  estimation  will  be  withheld  until  Chapter  A.  Both  univariate 
and  bivariate  techniques  will  be  important  in  the  study  of  bivariate 
data  analysis.  Before  considering  the  techniques,  some  preliminary 
concepts  need  to  be  introduced. 

For  the  following  definitions  it  will  be  understood  that  f  is  an 

n 

estimate  of  an  unknown  p.d.f.  f  based  on  a  random  sample  of  size  n. 

Def ini tion  1,1.1  The  mean  squared  error  (MSE)  of  f^(x)  is  defined 
by 

MSECf  n(x>  ]  -  E{|fn(x)-f  (x)  I*}  (3.1.1) 

If  KSE[f  (x)  ]  -*■  0  as  n-*«  ,  then  one  says  that  f  (x)  is  pointwise 
n  n 
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cons i stent  i n  quadrat i c  mean  or  pointwi se  cons i stent  i n  mean  square 

for  estimating  f  (x)  .  If  sup{MSE  [f  (x)  ]}  +  0  as  n-*«®  ,  then  one  says 

x  ' 

that  f  is  uni  form! y  cons i stent  i n  quadrat i c  mean  for  estimating  f. 

Oef ini tion  3.1.2  The  mean  i nteqrated  square  error  (Ml SE )  of  f  is 
given  by 


Ml  SE  (f  n)  -  E{  /°°|fn(x)-f  (x)  |ldx}.  (3-1-2) 

—  oo 

If  M  J  SE  (f  )-*0  as  n-*”,  then  one  says  that  f  is  uni  formly  cons  i  stent 
n  n 

wi th  respect  to  WISE  for  estimating  f. 

The  deinitions  of  weak  and  strong  consistency  apply  to  fn(x) 
pointwi se.  while  uniform  cons i stencv  wi 1 1  imply 

sup  If  (x)-f  (x)  |  0  (3-1-3) 

'  n 
x 

in  probability  (weakly)  or  almost  surely  (strongly). 

Two  theorems  are  often  exploited  regarding  applications  to 
density  estimation. 

Theorem  3.1.1  Let  Xj,X2« ....  and  X  be  random  p-vectors  and  let 

p 

g:R  -*■  R  be  a  real -valued  measurable  function  that  is  continuous  w.p.l. 
Then 


d • s •  a. s . 

i)  Xn  -*■  X  impl  ies  g  (Xn)  -►  g  (X)  ; 
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P  P 

ii)  X_  X  implies  g  (X  )  ■*  g  (X)  ;  and 

d  n  d 

iii)  X  -*•  X  implies  g  (X  )  -*■  g  (X)  . 
n  n 

Serfling  ( 1 98O ,  pp.  2L-25)  proves  this  theorem  and  references  the  more 
general  case  where  g  is  vector-valued.  By  application  of  this 
theorem,  one  obtains  the  following  useful  result. 

Theorem  T.1.2  Let  X  be  ANfu.a2)  with  a2-*-  0.  Let  a  be  a 
- * -  n  n  n 

real-valued  measurable  function  that  is  differentiable  at  x«u,  with 
g '  (u) *0 .  Then 

g (X  )  is  AN  (g  (p)  ,  [g '  (u)  ]  *a*)  . 
n  n 

As  will  be  seen  later,  these  theorems  are  usually  applied  for 
g(x)«log(x).  Serfling  discusses  applications  to  such  areas  as 
variance  stabilizing  transformations  and  gives  several  examples 
utilizing  choices  for  g. 

3.2  Nearest  Neighbor  Density  Estimation 

Loftsgaarden  and  Quesenberry  (1965)  attack  the  problem  of 
estimating  a  multivariate  density  function  and  arrive  at  a  fairly 
simple  method  that  possesses  desirable  properties.  Their  work  appears 
before  that  of  Cacoullos  (1966)  who  generalized  the  kernel  approach  to 
the  multivariate  case,  and  thus  represents  the  first  formal 
development  of  a  technique  for  multivariate  nonparametr ic  density 
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estimation. 

Let  Xp...,Xn  be  i.i.d.  random  p-vectors  with  absolutely 
continuous  c.d.f.  F(xj,...,x  }  and  p.d.f.  f  (xj , . . . , xp) .  Define 

(x)  -  volume  of  smallest  sphere  centered  at  x 

containing  at  least  k  points.  (3.2.1) 

Recall,  a  p-dimensional  sphere  with  radius  r  has  volume 

V  -  [irp/2rp]/r  (p/2+1)  ,  (3-2.2) 

where  r(0  is  the  gamma  function.  The  nearest  neighbor  density 
estimate  of  f  (x)  is  given  by 

f  (x)  -  (k/n)  /V  (x)  ,  (3-2.3) 

n  k 

where  k«k  (n)  is  chosen  to  satisfy  certain  limiting  properties. 

Loftsgaarden  and  Quesenberry  (1965)  show  that  fn(x)  is  weakly 
consistent  for  estimating  f  (x) .  Devroye  and  Wagner  (1977)  show  that 
with  the  conditions 

i)  k(n)-w»  and  k(n)/n-»0; 


ii)  k  (n)  /log  (n)  -w»;  and 

iii)  f  is  uniformly  continuous  on  Rp, 


(3*2.6) 
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then  f  (x)  is  strongly  uniformly  consistent  for  estimating  f  (x)  ,  i.e., 


sup  I  fn  <x)  ~f  (x)  |  0  a.s. 


(3-2.5) 


For  our  purposes,  we  emphasize  two  theorems  due  to  Moore  and 


Yackel  (1977b)  and  restrict  our  attention  to  the  univariate  case. 


Theorem  3.2.1  Let  f  (x)  be  given  by  (3*2.3)  and  let  the  following 
properties  hold: 


i)  f  is  continuous  at  x; 
ii)  k  (n)  -»  and  k(n)/n-*0  as  n-r°°;  and 
i  i)  k  (n)  /log  (log  (n) )  -*«. 


Then  f  (x)-*-f(x)  a.s. 
n 


Theorem  3.2.2  Let  f  (x)  be  given  by  (3. 2. 3)  and  let  the  following 
n 

properties  hold: 


i)  k(n)-*°  and  k  (n)  /n-*0  as  n-*»j  and 
ii)  A(n)  | f  (x  )*f(x)j-*Q  in  probability  when  |xn-x|SR(n) 


where  R(n)  is  the  radius  of  the  sphere  yielding  V^(x) .  Then 


/T(n)  Cf  (x)-f(x)]  -*■  N[0,  f  *  (x)  ]  , 
n 
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i.e.,  f  (x)  is  AN[f  (x)  ,  f 1  (x)  /k  (n)  ]  . 
n 

As  suggested  previously,  one  would  prefer  that  the  asymptotic 

variance  did  not  include  an  object  to  be  estimated.  Theorem  2.3*7  is 

useful  in  suggesting  variance  stabilixing  transformations  for  such 

cases.  Observe,  using  the  log  transformation  and  applying  Theorem 

2 - 3 - 7 »  one  obtains  !og[f  (x) ]  is  AN[log  f  (x) , 1/k  (n) ] .  This  result  is 

n 

particularly  important  to  expansion  techniques  for  log  f (x)  to  be 
considered  in  section  4.3. 

Moore  and  Yackel  (1977a, 1977b)  and  Mack  and  Rosenblatt  (1979) 
consider  a  more  general  representation  than  (3.2.3)  and  suggest 
analogies  to  kernel  density  estimation.  The  asymptotic  properties  of 
nearest  neighbor  estimators  thus  mirror  those  of  kernel  density 
estimators.  Kernel  estimators  are  considered  in  the  next  section. 

3-3  Kernel  Oensity  Estimation 

The  kernel  method  of  density  estimation  provides  a  natural 
extension  to  the  popular  histogram  estimator  and  has  a  firm  foundation 
of  approximation  theory  results  to  support  its  use.  Rosenblatt  (1956) 
considers  this  extension  of  the  histogram  approach,  and  Parzen  (1962) 
details  the  theoretical  implications  of  this  technique. 

Observe,  a  histogram  estimator  of  f  may  be  constructed  such  that 
the  partition  of  the  support  of  f  is  composed  of  equally  spaced 
intervals.  Consider  the  estimator  due  to  Rosenblatt  0956)  defined  by 
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fn(x)  -  (Fn  (x+h)  -Fn  (x-h) }  /  (2h)  (3.3-0 

where  h-h  (n)  is  a  real  valued  positive  function  of  the  sample  si2e  n 
with  h(n)-*-0  as  n*°o.  Rosenblatt  shows  that  for  h(n)-kn  a,  optimal 
values  for  k  and  a  may  be  chosen  based  on  asymptotic  mean  square  error 
or  integrated  asymptotic  mean  square  error  considerations. 

Now,  if  one  lets  the  kernel  K  be  defined  by  K(u)-l/2  for  |u|<l 
and  K(u)**0  elsewhere,  then  (3.3*0  becomes 

n 

f  (x)  -  [1/  (nh  (n) ]  l  K[  (x-X  .)  /h  (n)  ]  .  (3-3-2) 

n  j-1  J 

Thus,  a  more  general  approach  is  suggested  using  the  kernel  K  (u)  as  a 
weight  function,  and  forming  estimators 

fn(x>  -  f"[l/h (n)  ]K[  (x-y 1 ) /h  (n) ]  dFn(y*) 

— as 

n 

-  [l/nh(n)]  l  K[(x-X.)/h(n)].  (3.3-3) 

j-1  J 

Parzen  (1962)  gives  a  table  of  some  common  kernels  and  develops 
conditions  that  K  (u)  must  satisfy  to  obtain  desirable  statistical 
properties  for  the  kernel  estimator.  Two  theorems  are  of  importance 
to  us. 

Theorem  3.3.1  Let  K (u)  be  a  kernel  satisfying 

i)  sup  |K  (x)  I  <  •  ; 
x 


i i)  /*jK  (x)  |  dx  <  ®; 

-oo 

i i i)  lira  | xK  (x) |  ■  0;  and 
x-*» 

i  v)  /  K  (x)  dx  ■  1 . 

—  CO 

Let  f  (x)  be  continuous  at  x,  and  let  nh(n)-*“  as  n-*-®.  Then 

f  (x)  -»•  f(x)  in  q.m. 
n 

Theorem  3.3.2  Under  the  conditions  of  Theorem  3-3- 1.  one  also  has 

that  f  (x)  is  AN[E(f  (x) )  ,  Var  {K[  (x-X)  /h  (n)  ]}/ (nh2  (n) )  ]  . 
n  n 

Silverman  0978)  gives  the  stronger  result  of  almost  sure  uniform 
consistency. 

The  basic  problem  one  faces  when  using  the  kernel  method  is  the 
choice  of  window  width  h  (n) .  Different  window  widths  produce 
different  shapes  and  in  particular  introduce  the  problem  of 
identifying  spurious  modes.  Some  degree  of  subjectivity  is  hence 
required  to  arrive  at  an  acceptable  shape  for  the  estimated  p.d.f. 
Silverman  0980)  suggests  an  objective  approach  to  choosing  a  window 
width,  but  the  approach  necessitates  estimation  of  variance  terms 
whose  properties  are  questionable.  Other  authors  have  suggested 
objective  techniques  such  as  cross-validation  and  "ridge  regression" 
that  seam  promising  but  still  display  weaknesses  that  cannot  be 
ignored.  Nonetheless,  kernel  density  estimation  has  been  extensively 
studied  in  the  literature  and  is  competitive  with  other  techniques. 
Cacoullos  (1988)  extends  this  technique  to  the  multivariate  case. 
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3 . k  Function  Approximation  Methods 

A  problem  in  mathematical  analysis  is  to  approximate  a  member  of 
a  suitably  restricted  class  of  functions  {f(x)}  by  a  series  expansion 

30 

technique.  The  simplest  approach  seeks  coefficients  C a j > j ^ 1  such 
that 


f  (x)  -  l  a.xJ  (3.4. 1) 

j-0  J 


in  the  sense  that 


m 

1 im  |f (x)  -  l  a,xJ |  ■  0.  (3.4.2) 

nr*®  j-0  J 

Functions  expressable  as  in  (3.4.1)  are  called  entire  functions 
(Davis.  1975)*  When  f (x)  is  a  density,  one  might  consider  estimators 


f  (x) 
m 


m 

-  l  v 

j*0  J 


J 


(3.4.3) 


where  aj  is  a  function  of  sample  data  and  m  is  the  order  of 
approximation  determined  by  some  meaningful  criterion.  The  usual 
series  expansion  for  f (x)  is  a  Taylor  series,  the  simplest  case  given 
by  (3«4.1)  with 


a  j  ■  f  ^  (0)/(j  f) ,  j-0, 1 , 


(3.4.4) 


where  f  ^(0)  is  the  j-th  derivative  of  f  (x)  evaluated  at  0.  One 


JC. 


could  then  base  (3*^*3)  on  estimates  of  the  derivatives  of  f  (x) . 
However,  this  is  usually  cumbersome  and  inefficient  in  practice. 

A  more  general  expansion  is  given  by 

00 

f (x)  -  y  e  f.(x),  aSxSb  (3-^-5) 

j-—  J  J 

OB  ao 

where  are  real  valued  constants  and  {^(x) }  is  a  system 

of  real  or  complex  valued  functions.  One  may  then  study  conditions 
under  which  the  expansion  (3. 4.5)  i*  justified.  This  problem  has  been 
studied  extensively  in  the  mathematical  literature  and  has  recently 
been  applied  to  statistical  problems  of  density  estimation. 

Following  the  development  of  section  2. A,  let  H  be  a  separable 
Hilbert  space  and  let  {  <f>k  (*)  J  be  a  complete  orthonormal  system  in 
H.  For  most  statistical  applications,  the  space  of  square  integrable 
functions  L*  (a.b)  is  general  enough  to  include  a  large  family  of 
p.d.f.'s  and  restrictive  enough  to  permit  formulation  of  useful 
theory;  thus,  we  will  henceforth  restrict  attention  to  the  Hilbert 
space  of  square  integrable  functions,  if  one  chooses  an  arbitrary 
member  of  this  space,  say  f  (x) ,  then  one  is  justified  in  using  the 
expansion  (3. A. 5)  with  x  in  the  interval  (a,b) .  Mathematical  analysts 
have  studied  this  expansion  for  a  variety  of  orthogonal  systems  of 
functions.  Series  expansions  of  the  form  (3.J».1)  may  be  transiated  to 
(3. A. 5)  by  the  Gram-Schmidt  orthogonal i sat  ion  of  the  basis  set 
{1 ,x,x* , . . .} .  Orthogonal  polynomials  and  complex  exponentials  are  two 
systems  that  have  been  studied  extensively.  The  Jacobi  polynomials 
provide  a  general  system  of  orthogonal  polynomials  to  consider,  with 
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Legendre,  Chebyshev,  and  Laguerre  polynomials  being  special  cases 
(Lanczos,  1956).  Complex  exponential  systems  are  the  basis  of  Fourier 
series  expansions  (Oavis,  1975.  Churchill,  1969). 

For  statistical  applications,  one  may  seek  to  estimate  a  variety 
of  functions  using  orthogonal  expansions.  Of  primary  importance, 
however,  is  estimating  the  c.d.f.  or  p.d.f.  that  generates  a  set  of 
data.  One  may  also  estimate  the  quantile  function  using  these 
techniques,  and  some  authors  base  an  approach  on  estimating 
character istic  functions  (see,  e.g.,  Watson  and  Leadbetter,  1963) .  We 
will  consider  several  approaches  using  orthogonal  expansions  to 
estimate  an  unknown  p.d.f.  either  directly  or  indirectly.  The 
assumption  of  a  finite  support  for  f  is  not  overly  restrictive  as  one 
essentially  is  estimating  the  truncated  density  given  by 

f  (x)  -  g(x)/  Jbg(x)dx,  aSxSb. 
a 

Furthermore,  transforming  a  data  set  to  fall  in  the  interval  [a,b]  is 
not  difficult  and  has  littla  if  any  affect  on  most  estimates  obtained. 
Some  systems,  such  as  Hermite  polynomials,  permit  expansion  over  the 
entire  real  line,  but  one  is  always  concerned  about  extrapolation 
problems  when  dealing  with  a  finite  data  set. 

Cancov  (1962,  taken  from  Bean  and  Taokos,  1980)  considers  the 
expansion  (3.6.5)  where  the  system  Uk(x) }  I#  orthogonal  w.r.t.  a 
weight  function  w(x) ,  i.a., 

/a  V**  *jW*00**  »  6(i.j) 


(3-6.6) 
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where  6  ( i , j )  is  Kronecker's  delta.  The  coefficients  (‘I'iJ  |<*_oo  are 
given  by 


6k  "  /gf  (x)«k(x)w(x)dx  <3-4.7) 

and  as  mentioned  in  section  2.4,  these  are  usually  called  Fourier 
coefficients.  Cencov  (1962)  then  obtains  estimates  for  these 
coefficients  based  on  the  empirical  c.d.f.  From  a  random  sample  of 
size  n  he  obtains 

\  "  /^♦|t(x)w(x)dF  (x)  «  (1/n)  l  $k(Xj)w(Xj).  (3*4.8) 

j-1 

The  estimates  given  by  (3*4.8)  have  same  nice  properties. 
Observe, 

E(§k)  •  (1/n)  l  E[$k(Xj)w(Xj)3  ■  /^k(x)w(x)dF  (x)  «  ek 
j-I 

and  hence  §k  is  unbiased  for  estimating  6.  Furthermore,  by  the  SLLN, 
§k-*6ka.s.  This  implies  that  for  a  finite  parameter  p.d.f.  f  (x)  with 
expansion  (3.4.5)  and  9j»0  for  jxs, 

m 

f n  (x)  ■  l  ek*|t(x),  aSxSb,  (3*4.9) 

k»-m 

is  unbiased  and  consistent  for  estimating  f  (x) .  However,  f  (x)  will 

n 

always  be  biased  for  estimating  infinite  parameter  models  of  the  form 
(3*4.5)*  In  this  case,  the  problem  than  becomes  one  of  choosing  the 
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"best"  order  m  such  that  f  (x)  in  (3.4.9)  is  a  reasonably  good 

n 

estimate  of  f (x)  . 

One  might  observe  that  the  Cencov  approach  is  a  method  of  moments 
estimating  scheme.  If  one  rewrites  (3-^-7)  in  terms  of  expectations, 
one  obtains 

ek  -  E[*k(X)w(X)].  (3-4. 10) 

and  hence  (3.4.8)  is  merely  a  method  of  moments  estimator  of  8  .  As 
mentioned  in  Chapter  2,  many  estimators  based  upon  the  e.d.f  fall 
into  this  category. 

Other  authors  have  examined  Cencov 's  technique  for  specific 
systems  of  orthogonal  functions.  Schwartz  (1967)  considers  expansions 
based  on  Hermite  polynomials  and  obtains  asymptotic  results 
competitive  with  kernel  estimators.  Walter  (1977)  obtains  further 
results  based  on  this  technique.  Kronmal  and  Tartar  (1968,1976) 
consider  trigonometric  systems,  and  Crain  097*0  uses  Legendre 
polynomials.  Anderson  0969)  indicates  that  the  Kronmal -Tartar 
estimators  seem  to  perform  better  than  the  Schwartz  estimators  based 
on  Monte  Carlo  studies.  This  would  imply  that  the  choice  of 
orthogonal  system  is  crucial  to  the  estimation  procedure.  Since  the 
literature  abounds  with  various  orthogonal  expansion  techniques,  only 
a  few  of  the  more  promising  ones  will  be  considered. 

Kronmal  and  Tartar  (1968)  consider  estimation  techniques  based  on 
Fourier  series  expansions  of  the  e.d.f.  F (x)  and  the  p.d.f.  f  (x) . 

They  consider  estimators  of  the  form 
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m 

f  (x)  -  I  a  *  (x)  ,  aSxSb  (3-1*.  11) 

''  j=-mJ  J 

and 

„  m 

F  (x)  -  l  A  *  (x)  ,  aSxSb  (3-4.12) 

j— (Tr*  J 

where  j^^satisfy  (3.4.6).  Using  trigonometric  systems  of 
orthogonal  functions  they  obtain 

m 

?m(x)  -  Cq/2  +  ^ckcos[kir(x-a)/(b-a)]  (3-4.13) 


where 


n 

ck  ■  2/[(b-a)n]  l  cos[k  (X  .-a)  /  (b-a)  ]  lA(X.),  (3-4.14) 

i  ■  1 

where  A«li,b].  This  estimate  is  a  derivative  of  an  estimate  of  F (x) 
obtained  by  the  expansion  (3.4.12)  using  Cencov  type  estimates  of  the 
coefficients,  i.e.,  using  Fourier  coefficients  based  on  the  e.d.f. 
Modifications  are  also  suggested  to  ensure  that  the  density  estimate 
is  positive.  Tartar  and  Kronmal  (1976)  also  consider  s  similar 
approach  based  on  complex  exponentials  (i.e.,  the  complex  form  of  the 
trigonometric  systems).  From  model  (3.4.11)  they  obtain  the  Cencov 
type  estimates 


n 

a  ■  (1/n)  l  exp(-2*ikX  .)  ,  k«-m,...,m,  (3.4.15) 

k  j-1  J 
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for  the  Fourier  coefficients.  Using  an  analogy  to  stepwise  regression 
and  the  MISE  criterion,  they  omit  coefficients  a^  and  a_k  if 

aka_k  <  2/  (n+1 )  (3-A.16) 

and  terminate  the  order  of  approximation  when  K  consecutive 
coefficients  are  deemed  not  significantly  different  from  2ero.  For 
practical  applications,  Tartar  and  Kronmal  suggest  letting  the  maximum 
order  of  expansion  be  mHO,  and  suggest  that  K*1  or  2.  Large  values 
of  m  or  K  usually  produce  very  wiggly  estimates  of  the  p.d.f.  The 
MISE  stopping  rule  suggested  by  Tartar  and  Kronmal  is  similar  to 
Par2en's  CAT  criterion  except  it  emphasi2es  the  contribution  of 
parameters  whereas  the  CAT  criterion  emphasi2es  the  reduction  in 
residual  variance.  Such  stopping  rules  add  a  degree  of  objectivity  to 
an  otherwise  subjective  endeavor,  but  detractors  often  question  the  ad 
hoc  nature  of  the  criterion  functions. 

Crain  (197*0  uses  Legendre  polynomials  as  an  orthogonal  basis 
set,  but  he  chooses  to  expand  log  f  (x)  instead  of  f  (x)  or  F  (x) .  Let 
f  (x)  be  continuous  and  strictly  positive  definite  such  that 

00 

log  f  (x)  -  l  WU)  -  C (6)  ,  aSxSb,  (3.**.  17) 

k-1 

00 

where  {  <t>k(x) }  are  the  Legendre  polynomials  over  [-1,1]  and  C(9) 
is  an  integrating  factor  insuring  that  f  (x)  integrates  to  one. 

Consider  order  m  approximators 


n 

m 

f  (x)  »  exp{  l  0  *<J>,  (x)  -  (0*)},a£x£b,  (3  -  i» .  1 8) 

m  ,  ,  k  k  — 

k»  I 

where  m  and  {9 *}  are  determined  by  some  suitable  criterion.  Crain 
k 

uses  the  criterion  of  maximum  likelihood  and  establishes  conditions 

that  ensure  a  unique  solution  vector  9*  exists  for  the  representation 

(3.4.18) .  One  observes  that  (3.4.18)  is  the  canonical  exponential 

model  representation  of  a  density  belonging  to  a  finite  parameter 

exponential  family.  Furthermore,  the  expansion  of  log  f  (x)  rather 

than  f  (x)  insures  that  f  (x)  will  be  positive.  One  may  then  treat 

m 

exp{-  C  (9*) }  as  an  integrating  factor  to  insure  that  f  (x) 
numerically  integrates  to  one. 

Sillitto  (1989)  uses  Legendre  polynomials  shifted  to  [0,1]  to 

expand  the  quantile  function  in  a  Fourier  series  and  suggests  using 

linear  combinations  of  order  statistics  to  obtain  estimates  of 

parameters.  Let  X.  ,X_  ,...,X  be  the  order  statistics  from  a  random 
In  2n  nn 

sample  of  site  n  with  strictly  increasing  (absolutely  continuous) 

c.d.f.  F  (x)  .  Let  C  *E  (X  )  be  the  expectation  of  the  p-th  order 
pn  pn 

statistic  in  a  sample  of  size  n.  Then 

CO 

Q (u)  -  l  (2 j - 1 )  A.P*  (u)  (3.4.19) 

jr|  J  J'' 

where 

and  P*  , (u)  is  the  shifted  Legendre  polynomial  of  degree  j-1.  A 
J-' 


I 


75 


natural  estimator  of  the  {X^ }  is  provided  by 


(3-^.2 1) 


where  X.  ,  .“X.  .  (i.e.,  treating  the  first  j  order  statistics  as 

J-k,j  j-k,n 

if  from  a  sample  of  size  j) .  Thus,  Xj  is  a  linear  combination  of 
order  statistics  whose  properties  are  discussed  in  Sillitto  (1959). 
From  the  {Aj}  one  obtains 


Q(u)  -  l  (2j-1)A  P*  (u) 
j-1  J  J 


To  obtain  a  density  estimator,  first  compute 


(1.U.22) 


q(u)  -  n{Q[(j+.5)/(n+l)]-Q[(j-.5)/(n+l)], 


(j-.5)/(n+l)Su<(j+.5)/(n+D  ,  (3* ‘♦•23) 


as  a  raw  derivative  of  Q(u)  and  then  use  the  reciprocal  identity 


[equation  (2.2.4)]  to  obtain 


f  (Q  (u) )  •  1/q  (u)  , 


(3- 4. 24) 


which  can  be  plotted  for  x“Q(u)  abscissa  values  to  look  like  a  density 
rather  than  a  density-quantile  function  if  desired. 

The  nature  of  functional  approximation  techniques  can  lead  to  a 
variety  of  solutions  based  on  the  nature  of  the  expansion  and  the 
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estimation  criterion  used.  An  annoyance  is  the  necessity  of 
considering  estimates  only  in  the  interval  [a,b],  but  for  most 
applications  this  poses  no  real  problem.  One  may  wish  to  investigate 
which  expansions  are  optimal  for  specified  distributions  possessing 
properties  such  as  symmetry,  skewness,  wide  tails,  etc.  However,  this 
may  be  a  difficult  task  with  little  reward  as  suggested  by  some  of  the 
simulation  studies  that  have  already  been  performed.  Since  the 
primary  goal  is  to  estimate  an  unknown  density,  one  should  seek  a 
procedure  that  prforms  well  for  a  large  variety  of  probability  models. 
The  class  L1  (a,b)  provides  such  a  large  collection  of  interesting 
models,  and  hence  the  techniques  developed  in  this  section  should  be 
competitive  for  a  wide  range  of  parent  distributions.  An  extension  of 
some  of  the  techniques  of  this  section  will  be  considered  in  section 
A. 3.  For  the  basic  asymptotic  results  of  any  particular  density 
estimator  discussed  in  this  section,  one  is  referred  to  the  citation 
corresponding  to  that  procedure.  We  will  have  little  use  of  these 
results  for  the  applications  and  extensions  to  be  considered  later. 

3.5  The  Autoregressive  Approach 

The  autoregressive  approach  to  density  estimation  due  to 
Carmichael  (1976)  and  Parzen  (1979b)  is  based  upon  an  analogy  between 
the  spectral  density  of  an  autoregressive  time  series  and  the 
probability  density  of  a  random  variable.  A  density  f  (u) ,  OSuSl,  is 
said  to  have  an  autoregressive  representation  of  order  m  if  it  is  of 


the  form 
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m 

f  (u)  -  K  |  l  a  (j)  exp  (2tt  i  j  u)  |  ‘  1 .  (3  •  5  -  0 

j=0 

where  a(0)-l,  m  is  a  positive  integer,  K  is  a  positive  constant,  and 
a  (1) , . . . ,a  (m)  are  complex  valued  coefficients  satisfying 

g  (2)  -  1  +  a  (I)  z  +...+  a  (m)  2  (3.5*2) 


has  all  of  its  roots  outside  the  unit  circle.  Parzen  (1979b) 
considers  the  autoregressive  representation  of  the  density-quantile 
function  fQ (u) . 

Analogous  to  parameter  estimation  for  an  autoregressive  time 

series,  one  estimates  the  parameters  a(l) . a  (m)  via  the  Yule-Walker 

equations 


R  (0)  R  (1)  ...  R(m-I) 

a  (1) 

R(-D 

R  (- 1)  R  (0)  ...  R  (m-2) 

a  (2) 

-  - 

R(-2) 

R  ( 1  -m)  ...  R  (0) 

a  (m) 

R  (-m) 

(3.5.3) 


where  R  (v)  is  the  Four i er-St i el tjes  transform  of  F (x) , 


R  (v)  «  /  exp  (2iri  vx)  dF  (x)  ,  |v  1-0, 1,2 . 

0  ' 


(3. 5. A) 


One  estimates  R  (v)  by 


£(v)  -  /  exp  (2irivx)  dF  (x) 

0  n 


(3-5.5) 
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and  obtains  a  (1) , . . . ,a (m)  by  solving  (3.5.3) .  One  also  obtains 

.  m 

Km  -  l  a(j)RU).  (3-5.6) 

j=0 

where  one  takes  a(0)“l.  The  constant  K  becomes  an  integrating 

m 

factor,  but  it  corresponds  to  the  prediction  variance  of  an 
autoregressive  time  series.  In  the  case  of  density  estimation,  Km 
will  be  interpreted  as  a  residual  variance  to  facilitate  an  objective 
procedure  for  determining  the  best  approximating  order  m.  One  selects 
order  m  such  that  Par2en‘s  criterion  autoregressive  transfer  function, 
given  by 


m 

CAT (m)  -  (l/n)  J  K;‘  -  K_1,  (3-5.7) 

j-1  J 

achieves  its  minimum  at  m. 

Carmichael  (1976)  gives  conditions  for  the  convergence  of 

m 

f  (x)  -  K  I  Y  a  (j)exp(2*ijx)  |'*  (3-5.8) 

m  rn  . u 

j-o 

to  the  true  density  f  (x) .  He  also  relates  the  autoregressive 
representation  to  an  approximation  in  a  reproducing  kernel  Hilbert 
space  using  eigenfunctions  and  eigenvalues  corresponding  to  the 
reproducing  kernel  R  (v) .  For  more  insight  into  this  interpretation, 
see  Parzen  (1959.  1967)  and  Bochner  (1955) • 

Parcen  (1979b)  develops  a  goodness-of-f i t  procedure  using  the 
autoregressive  technique  on  a  uniform  density  d(u).  First,  observe 


that  under  HQ:  fQ(u)*f0QQ (u)  for  some  specified  f qQq  (u) ,  the  density 
d  (u)  def i ned  by 

d(u)  -  f0Q0(u)/fQ(u).  OSuSl,  (3-5.9) 

is  a  uniform  density  over  [0,1].  One  can  then  develop  a 
goodness-of-f i t  procedure  based  on  the  sample  uniform  density  defined 
by 

d  (u)  -  fQQ0(u)/fQ(u) ,  OSuS),  (3-5.10) 

for  some  estimate  fQ(u)  and  null  value  f-Q^u)  .  Par2en  develops  an 
autoregressive  estimator 

m 

d  (u)  -  K  I  T  a(J)exp(2iri  ju)  I OSuSl,  (3-5.11) 

m  m'  L 

j-0 

where 

K  ■  /  I  T  a  (j)  exp  (2iri  ju)  I  >d  (u)  du,  (3.5.12) 

m  o  j-0 

^(u)  -  f  Q<lQ(u)q(u)  /oQ.  (3-5-1S1 

with  observing  as  an  integrating  factor  and  q  (u)  representing  the 
empirical  quantile-density  given  by  equation  (2.2.41).  The  values 

a  * 

a  (1) . a  (m)  are  derived  from  the  Yule-Walker  equations  using 
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Ot  ru^ 

0  (u)  ■  i^dU)  dt,  OSuSl,  (3  -  5  •  1  *») 

in  place  of  Fn  (x)  in  equation  (3-5*5) »  One  may  then  use  dm  (u)  to  form 

test  statistics  for  testing  H  :  fQ(u) -f  ^  (u)  against  specified 

alternatives.  The  estimate  d  (u)  leads  to  an  estimate  of  fQ(u)  given 

m 

by 

f Q  (u)  -  f.Q  (u)/d  (u) ,  OSuSl,  (3*5- 15) 

u  u  m 

which  is  based  on  the  representation  (3. 5 .10).  Observe  that  this 
estimate  of  fQ(u)  is  "weighted"  by  the  null  density-quantile  f qQq  (u) . 
Par2en  (1979b)  suggests  that  using  the  normal  density-quantile  for 
f^Qg  (u)  provides  an  essentially  nonparametr ic  procedure  in  that  a 
variety  of  distributional  shapes  may  still  be  discovered  using  this 
symmetric  "smoothing"  function. 

One  of  the  drawbacks  to  the  autoregressive  approach  is  the 
difficulty  in  justifying  its  use  in  an  intuitive  fashion  to  persons 
ignorant  of  autoregressive  time  series  modeling.  However,  as  Parzen 
(1979b)  observes,  the  knowledge  of  time  series  analysis  is  not 
essential  for  one  to  be  able  to  apply  the  procedure.  The 
autoregressive  approach  also  seems  to  be  a  monster  of  computational 
complexity,  but  many  of  the  computational  problems  have  been  overcome 
by  numerical  analysts. 

There  are  many  advantages  to  the  autoregressive  approach  to 


density  estimation: 


1)  It  provides  an  objective  means  of  determining  the  amount  of 
smoothing  required. 

2)  It  provides  an  abundance  of  goodness-of-f i t  diagnostics  for  a 
specified  null  distribution. 

3)  It  has  desirable  asymptotic  properties  and  seems  to  perform  well 
for  small  samples. 

U)  Computer  software  is  available  implementing  the  procedure  (Parzen 
and  Anderson,  1980) . 

The  objective  determination  of  the  smoothing  order  is  further  enhanced 
in  that  it  is  intuitively  justified  by  the  autoregressive  model 
interpretation  of  the  CAT  function.  One  disadvantage  to  the 
autoregressive  approach  is  that  it  may  not  be  extendable  to  the 
multivariate  case.  With  this  in  mind  a  comparable  procedure  is 
developed  in  the  next  chapter  that  readily  extends  to  the  bivariate 
case. 

3.6  Other  Approaches 

In  this  section  we  briefly  mention  techniques  that  in  some  cases 
are  variants  of  the  three  previous  techniques  mentioned. 

The  spline  method  may  be  considered  as  an  extension  of  the  kernel 
method  with  additional  restrictions  made  to  determine  the  type  of 
smoothing  desired  and  the  class  of  spline  functions  to  be  employed. 
Wahba  (1971)  considers  smoothing  the  empirical  c.d.f.  or  the  empirical 
quantile  function  and  then  differentiates  the  smoothed  estimators  to 


obtain  an  estimator  for  the  density.  A  selling  feature  of  this 
technique  is  improved  rates  of  convergence  in  mean  square  over  the 
kernel  method. 

The  technique  of  discrete  maximum  penalized  likelihood  (DrtPl) 
estimation  as  presented  by  Tapia  and  Thompson  (1978)  uses  an  approach 
that  is  a  combination  of  kernel  and  spline  methodologies  employing  a 
discrete  approximation  to  a  likelihood  functional.  The  resulting 
estimator  is  a  maximum  likelihood  estimator  (m.I.e.)  of  a  criterion 
function  with  an  arbitrary  smoothness  parameter.  The  object  to  be 
maximized  is  a  discrete  approximation  to  the  functional 

n 

L  (f)  -  n  f  (X.)exp(-a  /°°[f '  '  (t)]*dt) .  (3.6. 1 ) 

i«l  -» 

Tapia  and  Thompson  present  results  for  this  approach  along  with 
suggestions  for  multivariate  extensions. 

The  reciprocal  identity  employed  by  Sillitto  in  section  3.U  is 
also  the  basis  for  the  estimator  proposed  by  Bloch  and  Gastwirth 
(1988).  Their  estimate  is  simply  the  reciprocal  of  a  raw  estimator  of 
q(u)«Q' (u)  similar  to  equation  (2.2.U1) .  Their  goal  concerns 
asymptotic  variance  estimation  for  sample  quantiles,  and  hence  they 
are  concerned  with  pointwise  estimation  rather  than  evaluating  shapes. 

There  are  many  techniques  for  nonparametr lc  density  estimation 
with  each  attempting  to  display  «ome  statistical  or  computational 
advantage.  The  references  mentioned  in  the  first  section  of  this 
chapter  discuss  most  of  the  existing  techniques  and  provide  a  more 
comprehensive  exposition  than  contained  in  this  section.  Our  goal  has 
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been  to  outline  the  major  classifications  of  density  estimation 
procedures  so  as  to  provide  a  framework  in  which  to  make  comparisons 
with  a  new  technique  to  be  developed  in  the  next  chapter.  Some 
comments  along  these  lines  are  offered  in  the  next  section. 

3-7  Concluding  Remarks 


|  Appraising  nonparametr ic  density  estimation  techniques  involves 

consideration  of  estimation  criterion,  robustness,  small  sample 
|  performance,  and  the  nature  of  the  statistical  problem  of  interest. 

Some  techniques  may  be  exceptional  for  pointwise  approximation  of  a 
|  density  but  lacking  when  shapes,  tail  areas, etc.,  are  important.  For 

•  example,  OMPt  estimation  seems  to  provide  good  estimates  at  a  grid  of 

*  mesh  points  but  somewhat  artificially  provides  the  shape  of  a  dens  y- 

i  Specifying  more  mesh  points  increases  computational  problems  and  slows 

convergence  of  the  algorithm.  The  emphasis  on  robustness  may  hinder 
I  evaluating  the  nature  of  the  tails  of  a  density.  Small  sample 

properties  may  appear  satisfactory  in  simulations,  but  the  problem 
I  remains  that  often  small  samples  do  not  contain  enough  information  to 

■  diagnose  weaknesses  in  the  estimate  obtained. 

| 

Perhaps  the  most  critical  problem  is  the  existence  of  smoothing 
I  parameters  or  orders  that  must  be  dealt  with  in  a  subjective  fashion. 

The  autoregress ive  technique  and  the  Tartar-Kronmal  orthogonal 
I  expansion  technique  suggest  criteria  for  obtaining  optimal  orders,  but 

_  further  research  is  warranted  into  the  development  of  meaningful  order 

*  determining  criteria.  However,  one  may  question  whether  model 

I 

I 
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selection  should  be  made  completely  automatic,  as  such  an  approach 
might  prevent  examining  interesting  models  that  may  have  more 
theoretical  motivation.  Automation  of  a  technique  could  destroy  its 
usefulness  in  an  exploratory  analysis. 

Another  consideration  is  computational  efficiency  in  light  of 
asymptotic  requirements  placed  on  smoothing  parameters.  One  may  find 
it  difficult  to  translate  asymptotic  restrictions  into  computer  code. 
Often  an  upper  {or  lower)  bound  is  programmed  into  a  procedure  so  that 
asymptotic  conditions  cannot  be  made  to  hold,  but  it  would  be  rather 
foolish  to  pay  too  much  attention  to  this  matter  since  very  large  data 
sets  may  be  a  rarity.  Tartar  and  Kronmal  (1976)  suggest  that  a 
maximum  order  of  10  will  be  adequate  for  most  data  sets  encountered. 
The  BISAM  program  discussed  in  Chapter  5  currently  restricts  one  to  a 
maximum  order  of  7  which  seems  adequate  for  most  data  sets.  The 
legitimacy  of  such  program  restrictions  is  illustrated  by  Table  1 
which  shows  some  values  for  common  bin  width  and  order  parameters  as  a 
function  of  samole  size  to  accomodate  the  asymptotic  theory. 

As  a  final  note,  the  observation  is  made  that  much  of  the 
literature  emphasizes  asymptotic  properties  paying  little  attention  to 
the  practicality  of  a  procedure.  While  asymptotic  properties  are 
desirable,  they  are  worthless  when  an  unmanageable  algorithm  is 
required  to  perform  the  necessary  computations.  Unfortunately,  this 
attitude  may  be  carried  to  extremes  as  indicated  by  the  overwhelming 
popularity  of  histograms.  One  naturally  attempts  to  seek  a  balance 
between  theory  and  computational  efficiency.  This  philosophy  is 
exemplified  in  the  methodology  and  computer  software  development 
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Table  1.  Asymptotic 

n  loo(n) 

Smoothing  Orders 

loq  (loo  (n) ) 

as  a  Function 

SORT (n) 

of  Sample  Size 

n** (1/3) 

20  3.00 

1.10 

4.47 

2.71 

50  3-91 

1.36 

7.07 

3-68 

100  4.61 

1.53 

10.00 

4.64 

500  6.21 

I.83 

22.36 

7.94 

1000  6.91 

1.93 

31 .62 

10.00 

10000  9.21 

2.22 

100.00 

21.54 

100000  11.51 

2.44 

316.23 

46.42 

1000000  13.82 

2.63 

1000.00 

100.00 
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4.  BIVARIATE  STATISTICAL  DATA  MODELING 

4.1  Introduction 

Many  of  the  density  estimation  techniques  of  Chapter  3  had 
multivariate  extensions.  Bivariate  density  estimation  will  provide 
the  framework  for  the  methods  of  bivariate  data  analysis  that  will  be 
developed  in  later  sections.  The  usual  problems  of  multivariate 
analysis,  however,  will  present  obstacles  to  direct  extension  of 
univariate  techniques.  One  has  difficulty  in  ordering  vectors  in 
higher  dimensional  spaces  as  well  as  defining  multivariate 
counterparts  to  univariate  functions.  Estimating  derivatives  of 
empirical  distribution  functions  is  made  more  difficult  and  any 
smoothing  must  be  accomplished  for  several  dimensions.  Graphical 
displays  must  be  t>  oken  into  component  parts  for  more  than  three 
dimensions.  Critical  regions  are  more  difficult  to  derive  and  power 
considerations  for  tests  of  hypotheses  may  be  theoretically 
impossible. 

Our  emphasis  has  been  on  function  estimation  and  graphical 
display.  For  a  multivariate  problem  of  more  than  three  dimensions, 
one  may  seek  to  break  up  the  problem  into  components  involving  three 
dimensions  or  less.  As  an  analogy,  recall  that  the  analysis  of 
variance  may  be  treated  as  multiple  t-tests  for  pairwise  comparisons. 
Naturally,  one  would  only  recommend  such  an  attack  if  the  higher 
dimensional  problem  had  no  solution  or  was  too  difficult  to  implement, 
which  is  not  the  case  in  the  analysis  of  variance  analogy.  Where 
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multivariate  extensions  are  not  possible,  we  will  recommend  treating  a 
multivariate  problem  as  a  composition  of  bivariate  problems  to  be 
handled  by  techniques  developed  in  this  chapter. 

4.2  Normal  Theory 

As  previously  mentioned,  the  usual  first  step  in  testing  a 

procedure  is  to  see  how  it  compares  to  the  normal  theoretic  techniques 

when  data  is  generated  by  a  normal  probability  mechanism.  In  this 

section,  existing  normal  theoretic  results  will  be  examined.  For 

references,  Rao  (1973) •  Kshirsagar  (1972),  or  Graybill  (1976)  provide 

basic  information  on  existing  normal  theoretic  methods. 

Recall,  a  random  p-vector  X  »  (X  .X  . X  )'  has  a  multivariate 

1  2  p 

normal  distribution  if  its  p.d.f.  is  given  by 

fx(x)  -  (2tt)'p/2  |?:|'1/2exp{-y(X-u)  *  E-1  (X- u)  }  (4.2.1) 

where  u  *E(X)=(E  (X  },...,  E  (X  ) ) 1  and  l  *  (Cov  (X ,  ,X  .) )  .  The  case  p«2 

—  —  1  p  i  j 

reduces  to 

fx  y(x,y)  *  (2t)  -1  (oxJY/l-pl)  -lexp{-l/[2  (l-pJ)7 

[ (x- ux)  Vor^+(y-uY)  Va*-2p(x-ux)  (y-Uy)  /  (axOy)  (**.2.2) 

where  -®<yx, 

fx  y(x,y)  as  the  product  of  N(y,a^)  and  N(y,0^)  p.d.f. 's,  implying 


Py<®, ax>0, a^>0,  and  —  1 <  d< 1 •  If  p  ■  0.  one  can  write 


--  '.Mars*— 
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that  X  and  Y  are  independent  if  p  ■  0.  This  suggests  the  following 
wel 1  known  theorem. 

Theorem  4 .  2 . 1  If  (X,Y)  is  a  bivariate  normal  random  vector,  then 
the  correlation  between  X  and  Y  is  zero  if  and  only  if  X  and  Y  are 
i ndependent. 

This  theorem  is  the  basis  for  many  tests  of  independence,  but  often 
its  generalization  to  a  nonparametr ic  setting  negates  the  "only  if" 
part  of  the  theorem. 

In  the  parameterization  (4.2.2)  one  may  seek  statistical 
estimators  of  p  and  examine  their  properties.  These  statistics  may 
then  be  employed  in  testing  procedures  to  test  for  independence. 

The  usual  approach  is  to  use  Pearson's  product  moment  sample 
correlation  coefficient 

r  -  l  (X.-X)  (Yk-Y)/4n  (Xk-X)»  l  (Yk-Y)>  (4.2.3) 

k=»l  Tk-I  k=l 

which  is  the  maximum  likelihood  estimate  for  p  given  a  random  sample 

(X  , Y  ) . (X  , Y  )  from  a  bivariate  normal  distribution.  Under 

11  n  n 

Ho:o-0.  one  has 


r  /(n-2)  /  (’-r1)  n.  t  (n-2)  .  (4.2.4) 

For  testing  p«0  vs.  pj40  one  rejects  H  at  level  a  if 
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|rj/(n-2)/(l-rJ)  >t(a/2;n-2).  (4.2.5) 

The  usual  nonparametr ic  approaches  to  tests  of  independence  are 
patterned  after  these  normal  concepts,  i.e.,  they  seek  to  estimate  a 
population  correlation  coefficient  nonparametr i call y  and  base  tests 
on  this  estimate.  Unfortunately,  most  nonparametr i c  approaches  must 
assume  bivariate  normality  for  this  approach  to  be  a  legitimate  test 
of  independence.  Huber  (1981)  notes  that  r  in  (4.2.3)  is 
distribution-free  under  an  assumption  less  restrictive  than 
independence  of  the  bivariate  observation  (namely,  exchanqeab i 1 i tv  of 
the  joint  n-vector  of  X  or  Y  values  is  assumed),  but  he  observes  that 
r  is  not  invariant  to  monotone  transformations  and  is  very  sensitive 
to  outliers  in  the  data.  The  approaches  considered  in  the  next 
section  attempt  to  overcome  such  weaknesses. 

4.3  Some  Concepts,  Measures,  and  Tests  of  Independence 

The  primary  reference  for  this  section  is  Lehmann  ( 1 966) .  A 
brief  discussion  of  the  more  common  nonparametr i c  tests  of 
independence  will  be  followed  by  a  discussion  of  some  useful  concepts 
and  measures  related  to  testing  for  independence.  As  usual,  tests 

will  be  based  on  a  bivariate  random  sample  (Xj.Yj) . { X n , Y n)  with 

assumptions  concerning  the  bivariate  distribution  of  (X,Y)  . 

The  discussion  of  simple  linear  rank  statistics  of  the  form 

k-1 


(4.3.1) 
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where  Qk  -  rank(Xk),  R^  *  rank  (Y^)  ,  and  tha  functions  c(i)  and  a(i) 
are  specified  score  functions,  may  be  found  in  Serf  ling  (1380)  or 
Ruymgaart  ( 1 97^*)  *  To  test  the  hypothesis  H^:  p-0  vs.  some  alternative 
for  a  suitably  defined  correlation  coefficient  o  one  may  consider 
estimators  of  the  form 


T  -  l  [c  (Q  )  -c]  [a  (R  )  -a]/  l  [c  (Q  )  -c]  1  £  £a  (R  )  -i]  *  (4.3.2) 
"  k-1  *  k-1  K  k-1 


where  c  *  0/n)  £  c  (Q  )  and  a  «  (1/n)  £  a  (R  )  .  Observe  that  T 
k-1  k-1  n 

depends  on  the  sample  only  through  the  simple  linear  rank  statistic 

S  .  T  is  defined  analagously  to  Pearson's  product  moment  sample 
n  n 

correlation  coefficient,  but  it  has  the  important  additional  feature 
of  being  invariant  to  monotone  transformations  of  the  data  since  it 
depends  only  on  the  ranks  of  the  observations. 

An  important  special  case  of  (4.3.2)  is  given  by  Spearman's  Rho. 
let  a(i)-c(i)-i  so  that 

S  -  l  QkRk.  (4.3.3) 

n  k-1 


Using  the  convention  of  ordering  X  values  and  letting  R^  be  the  rank 
of  Y  corresponding  to  Xfc  ,  (4.3.3)  becomes 


Sn  .  j  k«k. 


When  X  and  Y  ere  independent,  S  is  AN(u  ,o  •)  where 


I 

\ 

I 
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_  u  *n  (n+1)  */4,  al*n*  (nl-1 )  V[144  (n- 1)  ] .  (4.3.5) 

■  n  n 

■  Define 

m  n 

I  p  -  1  -  6  I  (R  -a)VCn(n*-0]  (4.3.6) 

■  n  k=1  * 

I  which  is  the  Spearman  sample  correlation  coefficient,  often  called 

■  Spearman's  Rho  (Spearman,  1904,  as  referenced  in  Randles  and  Wolfe, 
>979).  The  population  parameter  that  p  is  estimating  is  given  by 

I 

p  -  3  Cov[sgn(X2-X1)  ,sgn(Y3-Y1)].  (4.3-7) 

For  testing  H^:  p-0  vs.  H  j :  p#0,  using  Pn,  critical  values  may  be 

■  found  in  Table  10  of  Conover  ( 1 97 * )  ■ 

■  Another  popular  nonparametr i c  correlation  coefficient  is 
Kendall's  Tau  (Kendal  1 , 1938)  based  on  the  concepts  of  concordance  and 

I  discordance. 

I  Definition  4.3.1  Two  pairs  (Xj,Y.)  and  (Xj.Yj)  are  concordant  if 

m  (X  -X.)  (Y.  -Y.)  >0  and  discordant  otherwise. 

■  1  J  •  J  - 

A  Define 

J  t-  2  PCttj-Xj) (Y, -Y2) >03  -  1  (4.3.8) 

V  or  equivalently 


I 
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t  -  Cov[sgn<X2-X,)  .sgnfYj-Y,)]  .  (4.3.9) 


Estimate  t  by 


T 


n 


*  2  Un  -  1 


(4.3. io) 


where  U  is  a  U-statistic  defined  by 
n 


U 

n 


II 

i<j 


h[(X.,Y.) 


(Xj.Yj)] 


(4.3.11) 


for  the  kernel  h  given  by 


hC (X  f,Y  f) ,  (Xj.Y j)  3  *  I  [  (X.-Xj)  (Y.-Yj)  ] 

•sgn(Q.-Qj) sgn(R.-Rj) ,  (4.3.12) 

where  l(-)  is  the  indicator  function  defined  to  be  1  if  the  argument 

is  positive  0  otherwise.  When  (X,Y)  is  a  continuous  random  vector, 

the  theory  of  U-statistics  yields  the  limiting  null  distribution  of 

U  .  When  X  and  Y  are  independent, 
n 

Un/[n(n-l)  (2n+5)/l8]1/2  +  N(0,1).  (4.3.13) 

Conover  (1971)  describes  the  usual  testing  procedure  and  Illustrates 
computational  strategies  for  employing  Kendall's  Tau  in  tests  of 
independence.  Table  11  of  Conover  (1971)  gives  critical  values  and  a 
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description  of  the  testing  procedure  for  Kendall's  Tau.  Lehmann 

(1966)  relates  this  procedure  to  the  d i f f erence  s i on  covar i ance  test  1 

and  suggests  similar  tests  based  on  the  ideas  of  concordance  and  j 

j 

discordance.  Hajek  and  Sidak  ( 1 96 7 )  show  that  the  projection  of 
into  the  class  of  linear  rank  statistics  is  equivalent  to  Spearman's 

f 

Rho . 

Blomqvist  (1950)  develops  a  procedure  that  counts  the  number  of 
data  points  lying  in  quadrants  I  or  III  when  the  origin  is  taken  to  be 
(Xo.Yq)  *  He  considers  the  specific  case  where  Xq  is  the  median  of  the 
X's  in  the  sample  and  Yq  is  the  median  of  the  Y's.  Let 

q  -  P[(X  -Xq)  (Y  -Yq)>0]  -  PC  (X  -Xq)  (Y  -Yq)  <0] 

1 

1 

-  2  PC(X-Xq)  (Y-Yq)>0]  -  1  (4.3. 11»)  I 

and  estimate  q  by 

Q  -  2  Un  -  1  (4.3.15)  i 


f  1 

1 

where 


) 
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classic  U-statistic  paper  and  presents  the  following  alternative  test 
in  the  sequel  to  this  paper  (Hoeffding, 1948b) .  Let 

A  (F)  -/  /**[F  (x,y)  -F  (x,“)  F  (°®,y)]  *dF  (x,y)  .  {4.3- 1 7) 

—  00—00 

Observe  that  A(F)  -  0  is  equivalent  to  X  and  Y  being  independent. 

Thus  a  test  based  on  an  estimator  of  the  functional  A(F)  would  be  more 
general  than  those  described  above.  Let 

iD(z],z2,z3)  -  Kz^z,)  -  KZj-z,)  (4.3.18) 

and  define  the  kernel  h  by 


hC^.y^ . (x^.y^)]  *  (1/4)  ,0(*i*xVx5) 


VwV  WY’V’  (4,3' 


19) 


Then 


U  -Cl/(r)]  I  h[  (X  t  ,  Y .  ) . (X.  ,Y,  )]  (4.3.20) 

n  S  1 1  *2  5  5 

is  a  U-statistic  ti^at  is  unbiased  for  estimating  A{F)  .  The  theory 
behind  this  approach  requires  only  that  X  and  Y  be  continuous  random 
variables.  Hence,  this  is  one  of  the  most  general  nonparametr ic  tests 
of  independence  available.  The  generalization  from  a  parameter  p  to  a 
functional  a(F)  is  a  very  important  step  in  deriving  new 
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nonparametr ic  tests  of  independence  that  are  more  powerful  than 
existing  procedures.  This  functional  approach  will  be  adopted  in  the 
sequel  using  information  functiona  s  relevant  to  the  problem  of 
i nterest. 

Given  the  disparity  between  nonparmetric  tests  of  independence, 
one  is  naturally  concerned  with  the  sensitivity  of  the  various  test 
statistics  to  specific  types  of  dependence  between  random  variables. 
Hoeffding's  procedure  based  on  A(F)  seems  to  be  the  most  sensitive  of 
the  approaches  mentioned,  but  due  to  computational  complexity  this 
procedure  has  not  been  widely  adopted.  Lehmann  (1966)  introduces  some 
concepts  relevant  to  studying  dependence  between  random  variables  and 
relates  them  to  existing  testing  procedures. 

Definition  A. 3.2  A  pair  (X,Y)  of  random  variables  is  said  to  be 
pos i t i ve 1 y  quadrant  dependent  i f 

P(XSx.YSy)  i  P  (XSx)  P  (YSy)  (4.  3.21) 

for  all  x.y.  If  the  inequality  is  reversed  in  (4.3.21),  then  X  and  Y 
are  said  to  be  negatively  quadrant  dependent  (NQD) .  If  the  inequa  lity 
holds  for  at  least  one  pair  (x.y) ,  then  X  and  Y  are  said  to  be 
strictly  quadrant  dependent . 

Lehmann  { 1 966)  states  several  theorems  relating  to  positive  quadrant 
dependence.  Some  general  results  are  given  in  the  following  remarks. 
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Remark  k.  3.  1  Any  bivariate  normal  random  variables  wi th  o  >  0  are 
PQO  and  with  p  <  0  are  NQD. 


Remark  k.3.2  If  X  and  Y  are  PQD,  then  Spearman's  Rho  (4.3*7) . 
Kendall's  Tau  (4.3.8) ,  and  Blomqvist's  q  (4.3*14)  are  all  nonnegative. 


j 


Remark  k.3.3  If  X  and  Y  are  PQO,  then  cov(X,Y)S0.  This 
generalizes  Remark  6.3*1  and  relates  quadrant  dependence  to 
covariance.  This  fact  is  based  on  a  result  due  to  Koeffding  which 
states 


Cov (X, Y)  »  /“J“[F  (x,y)-f  (x,“)f  (“S  y)  ]dxdy .  (U.3.22) 

-00-00 

Observe  that  dividing  inequality  (4.3.21)  by  the  positive  quantity 
P  (XSx)  yields 


P  (YSy | XSx) fcP  (YSy)  (4.3*23) 

which  implies  that  a  knowledge  of  X  being  small  increases  the 
probability  of  Y  being  small.  This  property  is  extended  to  the 
concept  of  regression  dependence. 

Oef  i  n  i  t  i  or.  4.3.3  If  (X,Y)  Ip  a  pair  of  random  variables,  then  Y 
is  bos l t i ve I y  regression  dependent  (PRO)  on  X  i f  P(YSyjXfcx)  is 
non-deereasing  in  x.  If  P(YSy|X»x)  is  non-decreasing  in  x  one  says 
that  Y  is  negatively  regression  dependent  (MRD)  on  X. 
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Remark  b.l.k  Regression  dependence  is  asymmetric,  i.e.,  Y  is  PRD 
on  X  does  not  imply  X  is  PRD  on  Y.  For  an  example  of  this  asymmetry, 
see  Lehmann  (1966,  pp. 1 145-1 146) . 

Remark  k. 3.5  If  Y  is  PRD  (NRD)  on  X,  then  X  and  Y  are  PQD  (NQD)  . 
The  converse  is  not  necessarily  true.  Hence,  regression  dependence  is 
stricter  than  quadrant  dependence. 

An  even  stricter  condition  than  regression  dependence  is  given  by 
the  following  definition. 

Definition  A. 3.4  Two  random  variables  X  and  Y  are  said  to  be 
posi tively  I  ike  I ihood  ratio  dependent  if  their  joint  c.d.f.  satisfies 

f  (x,y  ')  f  (x1 , y)  Sf  (x.y)  f  (x'  ,y ')  (k.  3.24) 

for  all  x<x',  y  <  y'.  If  the  inequality  in  (A. 3.2k)  is  reversed,  X 
and  Y  are  said  to  be  neaa tively  1 i ke 1 i hood  ratio  dependent. 

Remark  k.3.6  Likelihood  ratio  dependence  implies  quadrant 
dependence  and  is  symmetric  in  X  and  Y. 

Remark  k.3.7  Any  bivariate  normal  random  variables  with  p  >  0  are 
positively  likelihood  ratio  dependent  and  with  p  <  0  are  negatively 
likelihood  ratio  dependent. 
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The  concepts  of  regression  dependence  and  likelihood  ratio 
dependence  are  primarily  employed  to  verify  quadrant  dependence.  The 
property  of  quadrant  dependence  is  one  of  the  weakest  conditions  of 
dependence  for  which  the  popular  nonparametr i c  tests  are  sensitive. 

The  last  part  of  this  section  suggests  some  parameters  that  seem  as 
general  as  Hoeffding's  A  (F)  in  detecting  any  form  of  dependence 
between  two  r.v.'s. 

Kimeldorf  and  Sampson  (1978)  consider  a  condition  known  as 
monotone  dependence  which  requires  the  existence  of  a  monotone 
function  g  for  which  P [Y»g  (X) 1 .  This  condition  is  very  restrictive 
and  implies  total  predi ctabi 1 i ty  of  Y  from  X.  A  less  restrictive 
measure  of  monotone  correlation  is  thus  proposed. 

Definition  A. 3.5  The  monotone  correlation  p*  between  two  r.v.’s 
X  and  Y  is  given  by 

p*  -  sup{p[f (X) ,g  (Y) ]} ,  (4.3.25) 

where  p(X,Y)  defines  the  correlation  between  X  and  Y  and  the 
supremum  is  taken  over  all  monotone  functions  f  and  g  for  which 
0<Var  [f  (X)  ]<»  and  0<Var  [g  (Y)  ]<®. 

One  may  compare  this  to  the  sup  correlation  introduced  by  Gebelein 
(see  Kimeldorf  and  Sampson  (1978)  for  reference)  which  is  equivalent 
to  p*  except  the  supremum  is  taken  over  all  Borel-measureable 
functions  f  and  g.  These  concepts  are  more  mathematical  than 
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statistical  and  are  only  applied  when  the  parent  distributions  are 
known.  Kimeldorf  and  Sampson  ( 1 97S)  do  not  suggest  any  estimators  for 
their  correlation  parameter  nor  do  they  propose  any  testing  procedures 
utilizing  the  concepts  developed.  They  do,  however,  point  out  the 
desirable  property  that  p*=0  is  equivalent  to  X  and  Y  being 
independent,  and  hence  they  have  developed  a  correlation  parameter  of 
special  interest  in  tests  of  independence.  Apparently  sup  correlation 
has  this  same  property.  One  need  only  find  estimators  of  these 
parameters  to  develop  a  powerful  nonparametr i c  test  of  independence. 
Clearly  this  is  an  awesome  task  that  has  yet  to  be  fully  implemented. 

Hajek  and  Sidak  (1967)  identify  local ly  most  powerful  rank  tests 
(LMPRT)  for  testing  for  independence.  Hoeffding  (1948b)  discusses  the 
problem  of  obtaining  unbiased  tests  against  all  alternatives.  Blum, 
Kiefer,  and  Rosenblatt  (I960  suggest  a  competitor  to  Hoeffding's 
U-statistic  approach  based  on  a  statistic  analagous  to  the  Cramer-von 
Wises  goodness-of-f i t  statistic,  but  their  approach  is  as 
computationally  complex  as  Hoeffding's  approach  and  has  received 
little  attention  in  statistical  applications.  Gibbons  (1970  and 
Conover  0970  consider  some  traditional  requirements  that  a  measure 
of  association  is  expected  to  satisfy  and  check  these  requirements  for 
the  popular  nonparametr i c  statistics.  They  fail  to  suggest  updating 
these  requirements  to  included  estimators  of  functionals  like  the  one 
proposed  by  Hoeffding  (1948b).  Until  estimators  and  testing 
procedures  are  developed  for  the  correlation  parameters  considered  by 
Kimeldorf  and  Sampson  (1978),  the  use  of  functionals  to  measure 
dependence  seems  to  be  the  most  promising  method  of  developing  general 
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nonparametr i c  procedures  for  bivariate  data  analysis. 


A. A  Bivariate  Density  Estimation  Using  Information  Criterion 


Let  (Xj ,Yj (Xn,Y  )  be  a  bivariate  random  sample  with  joint 
c.d.f.  F^  Y  marginals  F^ ,  Fy,  and  associated  functions  under  the 
usual  notation.  Form  the  uniform  bivariate  sample 
(Qj/ (n+1)  ,Rj/ (n+1)) , . . .,  ((^/(n+1)  ,Rn/  (n+1))  where  Q.»  rank(X.)  and 
R.-  rank(Yj).  One  may  then  treat  this  as  a  sample  from  the  dependence 
density  dlu^^)  to  form  estimates  dfuj^)  using  generalizations  of 
the  techniques  of  Chapter  3* 

The  nearest  neighbor  techniques  of  section  3-2  requires  no 
generalization  since  it  was  designed  for  multivariate  density 
estimation.  The  theorems  stated  hold  for  the  bivariate  case.  One 
notes  that  this  technique  has  an  advantage  over  other  bivariate 
procedures  in  that  subjective  considerations  of  the  smoothing 
parameter  k  (n)  are  not  unduly  complicated  by  multivariate 
generalizations.  The  value  V^(x),  however,  becomes  more  complicated 
in  higher  dimensional  settings.  This  remains  one  of  the  easier 
computational  techniques  in  the  bivariate  case  when  compared  with 
other  approaches. 

Cacoullos  (1966)  generalizes  kernel  density  estimation  to  include 
multivariate  estimators  of  the  form 


f  (x 
n 


P  n 

(1/n  n  h.)  I  k;  >,  -X j ,)  /h, . (xp-Xjp)/hp].  (k.U.l) 
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Often  one  chooses  the  kernel  function  to  be  a  product  of  univariate 
kernels.  Cacoullos  0966}  proves  multivariate  extensions  to  most  of 
the  theorems  found  in  Par2en  (1962).  In  the  bivariate  case,  one  has 


fn(*.y> 


[1/n  (hj  h2)  ] 


n 


l  K[(x-X  )/h.  ,  (y-Y  )/h,] 

j  =  l  J  '  J  L 


(4.4.2) 


which  is  usually  taken  to  be 


n 

f  (x.y)  -  [1/n  (h  h  )  ]  V  K.  [  (x*X.) /h.  ]  K  [(y-Y.)/h  ]  (4.4.3) 

n  I  /  I  J  *  J  ^ 

for  univariate  kernels  and  l<2  .  For  this  estimator,  the  window 
width  problem  is  essentially  raised  to  a  power  of  2.  For  example, 
looking  at  estimates  for  three  different  window  widths  in  the 
univariate  case  would  expand  to  looking  at  nine  different  estimators 
in  the  bivariate  case  to  include  all  possible  combinations  of  window 
widths.  Nonetheless,  this  technique  remains  one  of  the  more  popular 
methods  of  bivariate  density  estimation. 

Tartar  and  Kronmal  (1970)  consider  p-dimensional  Fourier 
expansion  methods  to  obtain  some  theoretical  results  for  multivariate 
density  estimation.  Tartar  and  Silvers  (1975)  apply  an  orthogonal 
expansion  technique  to  the  estimation  of  a  bivariate  density  and 
suggest  theoretical  implications  and  applications  for  decomposing  a 
mixture  of  Gaussian  distributions.  We  propose  a  new  approach  that  is 
based  on  the  Tartar  and  Kronmal  estimation  scheme  except  that  a 
different  estimation  criterion  is  employed.  The  approach  is  motivated 
by  seeking  a  more  sophisticated  estimation  criterion  than  the  method 
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of  moments  and  basing  i t  on  a  "better"  initial  estimate  that  the 

empirical  c.d.f.  We  will  introduce  this  new  estimator  by  first 

considering  the  univariate  case. 

Let  X,,...,X  be  i.i.d.  r.v.'s  with  p.d.f.  f.  Let  log  f  be  in 
I  n 

00 

L*(a,b).  If  {$k(x)  a  complete  orthonormal  system  of  functions 

in  L 2  (a.b) ,  then 


log  f  (x)  —  J  M  "  c  (8)  >  a*x£b,  (4.4.4) 

k— “ 

where  {9  }  °°  are  real  valued  constants  and  C (0)  is  an  integrating 
k  k*-oa  — 

factor  to  insure  that  f  (x)  integrates  to  one.  Following  Crain  097**)  . 
one  may  consider  order  m  approximators 


m 

log  fm(x)  ■  £  ek<)>k  M  •  C  (©)  ,  aSx&b ,  (4.4.5) 

k»-mK 


n  m 

and  attempt  to  find  estimates  {6.  }.  that  possess  desirable 

k  k«-m 

statistical  properties  to  yield  an  estimate  of  f  (x) . 

% 

If  one  chooses  the  criterion  of  minimum  information,  one  seeks 

parameter  estimates  that  minimize  I (f  ;f  )  where 

id  m 


a  m  a 

log  f  (x)  -  T  0  0  (x)  -  C  (0)  ,  aSxSb,  (4.4.6) 

m  k--mk  k 


However,  f  is  unrealizable  so  that  the  quantity  I (f  ;f  )  can  only  be 
m  mm 

examined  from  a  limiting  perspective  as  in  Crain  0974).  Furthermore, 
different  choices  of  m  yield  different  estimators  with  the  ultimate 
goal  being  the  estimation  of  the  true  density  f. 
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To  overcome  these  problems  consider  an  alternate  definition  of 

information.  Recall  the  definition  of  the  bi -informat ion  between  two 

densities  given  by  (2.3*10).  Using  this  measure  of  information  as  an 

estimation  criterion,  the  problem  may  be  rewritten  to  resemble  an 

exercise  in  continuous  parameter  regression  analysis. 

Let  f  be  a  "raw"  estimator  of  f  satisfying 
n 

i)  f  -rf  in  q.m.  or  a.s.,  and 
n 

ii)  (f  (x) }  is  asymptotically  a  Gaussian  process, 
n 

Conditions  (4.4.7),  Theorem  3*1.1.  and  Theorem  3.1.2  guarantee  the 

appropriate  behaviour  for  log  as  required  below.  The  conditions 

are  stated  in  terms  of  f  since  this  is  how  they  are  usually  found  in 

n 

the  literature.  Consider  the  approximate  model 

~  m 

log  fn  (x)  «  l  0k*k(x)  +otnG(x),  aSxSb,  (4.4.8) 

’  k»-m 

where  G (x)  is  a  Gaussian  process  and  a  is  a  "generalized  variance". 

n 

Using  bi -informat ion  based  on  empirical  measure  and  representation 
(4.4.6)  for  an  estimator  of  f,  observe 

ll(f  m  fb  |1oS  f  (x)-  ?  0  *{x)  |  »dF  (x) 

k»-m 

-  0/n)  l  |  log  fn(X.)-  I  0  $  (X.  )  |  * 
i  “1  k—m 


(4.4,9) 


I 


The  constant  term  in  practice  is  omitted  during  the  parameter 
estimation  phase  and  re-introduced  later  as  an  integrating  factor. 
Equation  (4.4.9)  indicates  that  minimum  bi-information  estimators  are 
equivalent  to  least  squares  estimators.  Estimators  of  the  form 
(U.l*. 6)  may  then  be  easily  derived  using  one's  favorite  least  squares 
regression  computer  program.  Furthermore,  if  f n  (x)  is  chosen  so  that 
log  f  is  in  LJ(a,b),  the  approximation  theory  for  Hilbert  spaces 
insures  that  the  least  squares  estimates  of  the  parameters  will  be 
Fourier  coefficients  for  a  suitably  orthonormal i2ed  system  of 
"independent  variables". 

One  still  faces  the  problem  of  determining  the  "optimal"  order  m, 
but  in  the  regression  framework  several  approaches  are  suggested. 
Hocking  (1976)  considers  a  variety  of  stepwise  regression  techniques 
that  may  be  useful  in  selecting  a  best  order  m.  Time  series  criterion 
functions  used  in  determining  optimal  orders  for  autoregressive  models 
may  also  be  useful,  Parzen's  CAT  (Criterion  Autoregressive  Transfer) 
function  and  Aka  ike's  information  criterion  (A  1C)  function  being 
primary  candidates  for  consideration.  The  WISE  citerion  of  Tartar 
and  Kronmal  (1970,1976)  may  also  be  employed. 

To  emphasize  the  Hilbert  space  approach  to  approximation  theory, 

suppose  that  the  estimator  log  f  is  detined  so  as  to  be  square 

integrable  (which  is  usually  the  case  since  most  estimates  will  be 

bounded  with  finite  support).  Let  {^(x)  an  orthogonal  system 

w.r . t .  df  (x) ,  i .e. , 
n 

/ b<l> . <x)  4>j (x)  dFp  (x)  -  5 ( i ,  j )  ,  (4.4.10) 


1 

I 


i 


J 


1 


a 

a 

D 

o 

i) 

B 

0 

II 


1 
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where  <5  (i , j)  is  Kronecker's  delta.  Observe, 


b  m 

"  /  |log  fn(x>-  I  e k\(x)  |‘dF n(x) 
a  k=-mK  * 

-  Jb  |  log  fn  (x)  |  >dfn  (x)  +  /b|  I  6  A  (X)  |  ldF n  (x) 
a  a  k=-m 


m  b 

*2  I  9  J  [log  fn(x)]<!>  (x)  dFn(x). 
k*-m  a  K 


Squaring  the  appropriate  terms  and  taking  advantage  of  the 
orthogonality  property  (4.4.10)  one  obtains 

~  m  ~  m 

-  0/n)  l  1°9  fn(X  )  +  I  9.1 
n  m  i=l  n  k=-m 


m  n 


Taking  derivatives  w.r.t.  0^  and  setting  the  equations  equal 
one  has 

~  n 

6k  “  {]/n)  109  fn(Xi)<j>klX')* 

which  is  the  Fourier  coefficient  of  the  expansion  (4.4.5)  w. 

empirical  measure.  One  may  easily  verify  that  the  estimates 

by  (4.4.11)  indeed  minimize  I  I  (f  ;f  )  so  that  a  minimum 

n  m 

bi- informat  ion  estimator  has  been  obtained. 

A 

For  the  estimator  f  (x)  given  by 


to  zero. 


(4.4.11) 


.  t. 

def i ned 
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fm(x)  -  exp[  1  6k<t>k(x)3 


k**-m 


(4.4. 12) 


to  be  consistent,  the  estimator  f  (x)  must  be  chosen  so  that 

n 


[log  f  (x)-log  f  (x)]  ■  o(l//n)  a.s. 
n  m 


(4.4.13) 


J 


By  Minkowski's  inequality, 


||  log  fm(x)-log  f  (x)  1 1  *»  ||log  fm(x)-log  fn(x)  +  log  f^M-log  f(x)j| 


s||log  fm(x)-1og  fn(x)  ||  +  |  |log  f^xj-log  f(x)||. 

The  second  term  of  the  right  hand  side  of  the  inequality  converges 

A 

almost  surely  to  zero  by  assumptions  (4.4.7),  and  hence  fm(x) 
converges  almost  surely  to  f (x)  by  assumption  (4.4.13)  and  Theorem 
3.1.1.  Although  this  result  seems  straightforward,  one  observes  that 
it  may  be  very  difficult  to  verify  assumption  (4.4.13)  for  a 
particular  estimator  fp(x)  because  of  the  difficulty  in  understanding 
the  behaviour  of  the  two  estimators  as  both  m  and  n  approach  infinity. 

A 

To  show  the  asymptotic  normality  of  f  (x) ,  let  conditions  (4.4.7) 

ro 

and  (4.4.13)  hold  with  log  fn(x)  being  AN[log  f(x),o*].  Furthermore, 

let  the  asymptotic  variance  o’  be  independent  of  f  (x)  and  leto*-*0  and 

n  n 

no* -wo  as  n-w».  This  usually  follows  when  f  (x)  is  consistent  for 
n  n 

estimating  f  (x)  and  taking  the  logarithm  of  f  (x)  is  a  variance 

n 

stabilizing  transformation.  Then 


! 


J 

II 

a 

9 

8 

D 

B 

0 

B 
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[log  fm(x)  -log  f  (x)]/on 

/v  ~  ~ 

■  [log  fm(x)-log  fn(x)]/cn+  [log  fn(x)-log  f(x)]/On 

*  (1/a  /n)  /ntlog  f  (x)-log  f  (x)  ]  +  [log  f  (x)-log  f(x)]/a 
n  m  n  n  n 

*  (1/a^/n)  A  (n)  +  B  (n)  . 

By  assumption,  (l/an/n)-K)  as  n-**>,  and  by  application  of  (4.4.13)  . 

(I/O  A  (n)  -HD  in  probability  as  rr*<*.  Furthermore,  B  (n)  converges  in 
n 

distribution  to  a  N(0,1)  r.v.  by  choice  of  initial  estimator  f  (x) , 

n 

and  hence  by  Slutsky's  Theorem,  log  f  (x)  is  AN[log  f(x),o*]. 

m  n 

Condition  (4.4.13)  severely  limits  these  results  and  has  not  been 
shown  to  hold  for  any  of  the  common  nonparametr i c  estimators.  The 
nearest  neighbor  estimate  satisfies  the  asymptotic  normality 
requirements  with  stabilized  variance,  but  the  more  stringent 
condition  (4.4.13)  has  not  been  verified.  Nonetheless,  the  estimator 

A 

f  (x)  is  intuitively  appealing  as  well  as  being  the  optimal  estimator 
m 

by  use  of  an  information  criterion.  Applications  considered  in 
Chapter  6  will  further  support  the  use  of  this  new  estimator  by 
considering  comparisons  with  some  of  the  nonparametr ic  density 
estimators  discussed  in  Chapter  3.  However,  our  main  concern  is 
bivariate  extensions  in  the  quantile  domain  to  overcome  some  of  the 
weaknesses  of  other  bivariate  density  estimators. 

The  extension  of  the  orthogonal  expansion  technique  to  the 
bivariate  case  is  relatively  straightforward.  A  bivariate  orthonormal 
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system  is  fairly  easy  to  obtain  as  the  following  theorem  indicates. 

00 

Theorem  U.k.  1  Let  ^  be  a  complete  orthcnorma!  system  for 

lJ(a,b).  Then  {$.(x}  d>.  (y)}.”  is  a  complete  orthonormal  system  for 

j  k  j , k=-» 

L1  (A)  where  A*{(x,y):  aSx.ySb  }. 

b  b 

Proof:  /  /  [<|>.  (x)  <f>  .(y)]  [<jik(x)  <j>  (y)  ]  dx  dy 

a  a  J 

b  b 

*  /  [$.  (x)  <t>k(x)  ]  dx  J  [<j>.  (y)  <)>  (y)3  dy 
a  a  J 

1  if  i *k  and  j  *  1 

a 

0  otherwise. 

00 

Hence,  { $ .  (x)  <j>.  (y) }  .  ,  is  an  orthonormal  system  for  l*  (A)  .  To  show 

j  K  j ,  k*00 

that  this  system  is  complete,  first  observe  that  if  g(x,y)e  L *  (A) , 
then  for  fixed  y.  treated  as  a  function  of  x,  g(x,y)e  L*(a,b),  and 
vice  versa.  Oef i ne 


h  (x)  -  /  i^(y)  g  (x,y)  dy. 

a 


Then 


/  /  <p.  (x)<p  (y)  g <x,y)  dx  dy  ■  0 

a  a  J  K 


i  mp I i es 


j 
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/  tj> .  (x)  h  (x)  dx  *  0 
a  J 

and  by  completeness  of  {  <t>k(x) }  it  follows  that  h  (x)  -0  a.e.  Thus,  by 
def ini tion  of  h  (x) 

,b 

/  4>k  (y)  9  (*,y)  “0 

a 

which  implies  for  fixed  x,  g(x,y)«0  a.e.  Reversing  the  order  of 

00 

integration  one  obtains  g(x,y)-0  a.e.  and  hence  {<Jk  (x)  <J>k  (y) }  ^  j^^is  a 
complete  orthonormal  system  for  L1(A).e 

This  theorem  allows  us  to  employ  one  of  the  many  popular  univariate 
orthonormal  systems  in  the  approximation  of  bivariate  densities. 

Orthogonal  expansion  techniques  still  possess  the  problem  of 
choosing  an  order  of  approximation.  However,  as  with  nearest  neighbor 
density  estimation,  the  problem  is  not  as  sensitive  to  dimensionality 
increases  as  with  the  kernel  method.  Tartar  and  Krortmal  (1970,1976) 
suggest  a  stopping  rule  in  sequentially  adding  terms  that  is  based  on 
the  sample  mean  integrated  square  error.  Such  stopping  rules  are 
useful  hut  may  prevent  one  from  observing  interesting  shapes  that  may 
result  from  addition  of  extra  terms.  Furthermore,  some  degree  of 
subjectivity  is  always  inherent  in  order  selection  criteria  despite 
heuristic  motivations. 

To  use  the  techniques  of  section  applied  to  estimating 
d(u^,u2>,  the  following  theorem  is  necessary. 


1  10 

Theorem  4.4.2  Let  log  fx  ,  log  fy  be  in  L‘(a,bj  and  let  log  y 

be  in  L1  (A)  where  A  is  defined  as  in  Theorem  4.4.1.  If  dfuj.Uj)  is 
defined  as  in  (2.2.17).  then  log  d  is  in  L a  (B)  where  B  is  the  unit 
square,  B  -  {(uj.Uj):  OSUj  .u^l  }. 

Proof :  Observe 

|  log  d(uru2)  I1  dUldu2 

■  0!  1 10,1 Cfx  ,Y  ,QX  ’  'Q  vS’  1  "x  <qx  <ui ' >  \  W  yS’  1 J  I  'dU|  au2 

*  JQ  /0  l>°8  fx>y  (Qx(u])  ,Qy(u2))-log  fx  (QJ((uI ) ) -log  fy  Wy  ^  I  *dU!  du2 
-  /jjtl'09  fx>y(Qx(Ui).Qy(u2))|*+|log  ^x(Qx(u,))|l 
+  (cross  product  terms) } |  d^  du2 

5  4X|IOfl  fX,Yl,dUldU2  +  C1'09  fX|adUl 

+  4,,0°  fY|,dU2+  2  Oj'09  f  X ,  Y  '°9  fX,dUldU2 

+  2  |  log  fx^y  log  fy|dUldu2  +  2  |  log  log  fy|dujdu2 

by  Minkowski's  inequality,  where  we  have  adopted  the  abbreviated 


notation 


fx  Y(Qx(*Jj)*Qy(^2^’  ^ x  *  f  X^xW '  *  ^ y 


f  y(Qy(u2)  )  • 


1  1  1 


X,  Y 


By  assumption,  the  first  three  terms  on  the  right  hand  side  of  the 
inequality  are  fintite.  By  Holder's  inequality, 

44  l'°3  fx,y  '°9  fxldu]du2  s  [  0^]Qg  fX,Yliduldu23  ' 

1  i 

t  j  |  log  f^’dUj]  1  <  «  , 

with  the  last  inequality  following  by  assumption.  The  finiteness  of 
the  remaining  terms  follows  similarly,  and  hence  the  theorem  is 
proved." 


The  fact  that  log  d  is  in  L*  (B)  allows  an  orthogonal  expansion  of 

log  d,  and  hence  one  can  apply  the  approximation  techniques  of  section 

3. 

The  use  of  complex  exponential  orthogonal  systems  in  a  Fourier 
expansion  will  necessitate  the  use  of  complex  least  squares  procedures 
to  carry  out  the  minimum  information  approach.  Specifically,  one 
considers  the  expansion 


log  d(u,,uj  "  I  £  8..$.(u  >$. 

'  £  j,lt— «-»kJ  * 


(u  )  -  C  (0) 


(U.k.lU) 


where  the  {9  }*T  .  are  complex  valued  parameters,  {a.Oj)}09  is  a 

jk  j  ,k--®  rj  j«-« 

univariate  complex  orthogonal  system,  and  C (9)  is  a  complex 
integrating  factor.  Since  log  d(U|,u2)  is  real,  the  contribution  of 
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the  complex  terms  must  vanish.  This  will  occur  if  conjugate  pairs 

always  appear  together.  For  a  finite  parameter  model  of  order  m,  one 
m 

estimates  the  {8.,  }.  ,  by  complex  least  squares  after  deriving  an 

j  k  j , k=-m  M 

initial  estimate  dfu^.u^)  of  dtu^.u^.  The  minimum  b  i  -  i  nformat  ion 

A 

estimate  dtu^.u^  is  then  obtained  from  the  model 

log  d  (u^  , u^ )  -  l  £  0.k4>.(u1 ) <j5fc  (u2)  -  C  (6)  (k.k.15) 

j  ,  k=-m  ^  ■* 

A 

where  C  (0)  is  chosen  so  that  dfu^.u^)  integrates  to  one. 

Some  consequences  of  this  approach  are  worth  noting.  If  the 
initial  estimate  dUi^.u^)  is  also  derived  using  only  the  rank 

A 

transformations  of  the  data,  then  dfu^.u^)  is  a  fully  nonparametr i c 
estimator  requiring  only  an  assumption  of  continuous  data  and  square 
i ntegrabi 1 i ty  of  the  logarithms  of  the  underlying  joint  and  marginal 
p.d.f.'s.  Furthermore,  d(u  ,u  )  is  invariant  to  monotone 
transformations  of  the  data  since  it  is  a  ranking  procedure.  The 

A 

parametric  representation  of  d(Uj,u^)  permits  complete  specification 
of  the  model  by  only  knowing  the  values  of  m*  estimates  of  the 
parameters  unlike  the  nearest  neighbor,  kernel,  and  penalized 
likelihood  approaches.  The  problem  of  defining  the  region  of  support 
is  surmounted  by  the  uniform  transformations  to  the  unit  square.  The 
estimate  also  possesses  many  of  the  desirable  properties  of  its 
univariate  counterpart  although  asymptotic  properties  are  confounded 
further  by  approximating  a  uniform  data  set  by  rank  transformed  data. 
The  problem  of  smoothing  or  order  determination  has  many 


heuristic  rules  of  thumb  all  of  which  need  further  researci.  The  CAT 


and  AIC  criterion  functions  seem  to  recommend  too  many  parameters  in 
initial  investigations  done  by  the  author,  while  a  minimum  information 


(maximum  entropy)  criterion  seems  to  pick  too  few  parameters  and  hence 
produces  an  overly  smoothed  estimate.  As  mentioned  earlier,  Tartar 
and  Kronmal  (1970,1976)  suggest  a  minimum  WISE  criterion  that  picks 
the  smal lest  order  m  such  that 


0  0  > 
m,m  -m, -m 


(n+l) 


(4.4.16) 


One  is  naturally  concerned  that  the  inclusion  of  too  many  parameters 
will  introduce  spurious  modes,  so  one  recommendation  is  to  produce 
three  estimates  with  varying  degrees  of  smoothness.  One  may  then  hope 
that  the  physical  constraints  of  the  problem  or  the  expertise  of  the 
experimenter  will  aid  in  model  selection. 

Interpreting  three  dimensional  graphs  and  contour  plots  of 
estimates  of  the  dependence  density  is  particularly  difficult,  due  in 
part  to  the  radical  nature  of  this  approach  to  data  analysis. 
Consequently,  one  may  prefer  to  form  estimates  of  the  bivariate 

■  f  (Qx  (u , )  .  Qy  (u 2) )  .  The  approach 

we  favor  forms 


density-quantile  function  fQU^.u^ 


fQ(Uj,u2)  -d(uj,u2)  fQ^  (uj)  fQY(u2)  (4.4.17) 

A 

where  d(u^,u2)  is  a  minimum  bi- informat  ion  estimator  of  d(uj,u2)  and 
the  estimated  density-quantiles  are  obtained  using  the  autoregress i ve 
method.  This  approach  allows  one  to  take  advantage  of  the 
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autoregressive  approach  to  univariate  data  modeling.  Goodness-of-f i t 
tests  may  be  conducted  for  null  distributions  of  the  univariate 
dens i t i es . 

Experience  with  this  approach  reveals  several  interesting 
features  that  are  of  importance  in  bivariate  data  analysis.  For  local 
alternatives  to  independence,  in  particular,  for  cases  when  the  linear 
correlation  exists  and  is  "small",  the  univariate  density-quantiles 
dominate  the  shaping  of  the  bivariate  density-quantile.  If  either 
univariate  density  is  bimodal  and  the  correlation  is  small,  dtu^^) 
will  closely  approximate  a  flat  surface  so  that  the  influence  of  the 
bimodal  univariate  density  will  create  a  bimodal  or  multimodal 
bivariate  density-quantile.  However,  when  two  random  variables  are 
highly  correlated,  the  dependence  density  dominates  the  shaping  of  the 
bivariate  density-quantile  and  tends  to  smooth  out  any  anomalies  in 
the  univariate  density-quantiles. 

In  practical  applications  one  is  particularly  concerned  that  a 
univariate  density  estimation  technique  not  introduce  modes  that  will 
unduly  affect  the  bivariate  density-quantile  function.  An  example  in 
Chapter  6  illustrates  a  situation  where  an  outlier  in  a  data  set 
introduces  a  spurious  mode  in  the  univariate  density  estimate  of  one 
of  the  variables  thereby  causing  the  bivariate  fQ  function  to  be 

A 

multimodal.  The  estimate  dCu^.u^)  is  unaffected  by  the  outlier,  but 
the  autoregressive  approach  is  'induly  influenced.  For  this  example, 
the  outlier  was  easily  detected  so  that  it  could  be  removed,  but  one 
remains  concerned  about  the  sensitivity  of  the  autoregressive 
estimate.  In  particular,  one  is  interested  in  the  ability  of  the  AR 


I 

I 

I 

I 

I 
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approach  to  detect  unimodal  and  bimodal  shapes,  since  clearly  outliers 
that  cannot  be  explained  by  measurement  error  usually  suggest 
bimodality.  A  study  was  performed  for  50  iterations  of  three  types  of 
samples  of  size  100.  The  first  sample  represents  a  N(0,l) 
distribution,  the  second  represents  a  sample  from  the  mixture 
0.5  N  (0 , 1 )  +  0.5  N(2,2),  and  the  third  sample  comes  from  the  mixture 
0.5  N  (0 , 1 )  +  0.5  N(3.2).  The  results  are  given  in  Table  2.  Order  0 
values  indicate  "acceptance"  of  a  null  hypothesis  of  normality.  Order 
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When  the  modes  are  distinct  as  in  the  third  case,  the  AR  modeling 
approach  performs  well,  whereas  in  case  2  the  technique  found  it 
difficult  to  distinguish  the  modes.  Note  that  high  order  selections 
are  very  rare  for  "smooth"  parent  densities.  One  interpretation  of 
the  selection  of  a  high  order  is  that  outliers  may  be  present  in  the 
data,  which  is  useful  for  the  application  of  bivariate  density 
estimation  to  data  analysis. 

In  Chapter  6  we  will  illustrate  the  use  of  the  estimated 
bivariate  density-quantile  to  locate  modes  in  a  bivariate 
distribution.  Naturally,  one  may  wish  to  investigate  this  approach 
with  different  estimators  for  the  univariate  density-quantile 
functions,  but  as  stated,  we  feel  that  the  AR  approach  has  the  most 
objective  and  consistent  results. 

U.5  Some  Entropy-Based  Measures  of  Association 

In  section  L,2  some  popular  nonparametr i c  tests  of  independence 
were  considered  and  viewed  in  light  of  some  concepts  and  measures  of 
dependence.  The  observation  was  made  that  only  certain  functionals 
and  variations  of  sup  correlation  were  general  enough  to  detect  all 
deviations  from  independence.  In  this  section  a  new  functional  based 
on  the  concept  of  information  is  introduced  that  is  as  general  as 
Hoeffding's  A(F),  and  various  testing  procedures  are  proposed  using 
this  new  measure  of  dependence. 

Let  (X,Y)  be  bivariate  random  variables  with  joint  c.d.f.  F x  y, 
joint  p.d.f.  fg  y,  marginal  c.d.f. 's  F^  and  Fy,  marginal  p.d.f.'s  f ^ 
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and  fy.and  quantile  functions  Qx  and  Qy.  The  dependence  distribution 
function  0(uj,u2)  and  the  dependence  density  d(uj,u2)  are  defined  as 
in  section  2.2,  equations  (2.2.16)  and  (2.2.17).  Using  equation 
(2.3-1)  defining  information,  one  obtains  the  information  between  the 
joint  p.d.f.  f x  y  and  the  product  of  the  marginals  f  x  and  fy  by 

00  oo 

l(fXY;fxV  “  /  /  f'og[f  x  Y(x.y)/fx  (x)  fy  (y)]} 

>  -«0  —oo  * 

•  f  x  y(x,y)  dx  dy .  (4.5. 1) 

With  the  usual  change  of  variable  u^  *  Fx  (x)  and  *  Fy(y),  one 
obta i ns 

1  (fx,y:fxV  • 

'  fx  y(Qx(U])  ,Qy(u2) )  qj^fu^qyfuj)  du1du2  (4.5.2) 

which  reduces  to 

l(fX,Y;fxV  *  U  Cog  dfu,.^)]  d(uru2)  du,^ 

--H(d),  (4.5.3) 

where  H  (d)  is  the  entropy  of  the  dependence  density.  From  the 
information  inequality  one  obtains 
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'*fX  Y|fxV‘°  iff  fX  Y  “fX  ^  fY  ^  ’  *•*•• 

and  thus  equation  (i* . 5 - 3)  leads  to  new  techniques  for  ascertaining 
whether  X  and  Y  are  independent. 

The  technique  to  be  investigated  will  estimate  H  (d)  by  estimating 
d  and  using  various  numerical  or  statistical  integration  procedures. 
Then  Monte  Carlo  studies  will  be  employed  to  investigate  the 
properties  of  the  estimator. 

One  solution  is  to  estimate  d  by  d  and  then  form 

~  l  l 

H  (d)  -  [log  d(u1>u2)]  d  DCuj.u^ 

-  (1/n)  l  log  d(Q  /(n+1)  ,R  /(n+1))  (U -5 -5) 

k«! 

where  D  (u  ,u  )  is  the  empirical  dependence  distribution  function  with 
1  2 

jumps  of  siae  1/n  at  the  points  (Q^/ (n+1) ,R^/  (n+1) ) .  Recall, 

Q  -rank  (X  )  and  R  "rank  (Y  ).  Another  solution  might  be  to  numerically 
k  k  k  k 

i ntegrate 


at  a  suitable  grid  of  points,  but  this  approach  seems  somewhat 
artificial  with  inherent  extrapolation  problems  that  might  provide 
deceptive  results. 

To  form  d,  one  treats  (Q  /  (n+1)  ,R  /  (n+1) ),...,  (Q  /(n+l),R  /(n+1)) 

II  n  n 

as  a  random  sample  from  a  bivariate  uniform  distribution  and  then 


T~  - 


H9 

applies  one  of  the  density  estimation  techniques  of  Chapter  3  that 
permit  bivariate  generalizations.  Two  techniques  will  be  mentioned 
here  for  obtaining  H (d) . 

One  approach  is  to  subjectively  decide  upon  the  best  window  width 
or  smoothing  parameter  to  produce  a  kernel  or  nearest  neighbor 
bivariate  density  estimate  dfupUj)  •  One  observes  that  such  a 
subjective  approach  unauly  complicates  the  procedure,  but  such 

A 

problems  cannot  be  overcome.  The  second  approach  is  to  form  d(u^,u^) 
using  d(Uj,U2)  as  a  dependent  variable  in  the  regression  approach  of 
the  last  section.  One  then  may  form  estimator  (4.5-5)  or  (4-5-6) . 

The  parametric  representation  afforded  by  the  regression  approach 
makes  method  two  less  computationally  cumbersome  than  it  would  be  in 
the  first  approach.  Hence,  one  uses  (4. 5.5)  for  a  suitable  choice  of 
d  and  (4.5.6)  for  the  parametrically  smoothed  bi - i nformat ion  density. 

~  /A 

A  Monte  Carlo  study  of  H (d)  and  H (d)  has  been  carried  out  for'  100 
iterations  of  samples  of  size  50  and  100  for  a  bivariate  normal 
distribution  and  a  distribution  composed  of  one  standard  normal 
marginal  and  another  marginal  corresponding  to  a  conditional 
Cauchy(O.l)  distribution.  The  sensitivity  of  Pearson’s  r  to  outliers 
is  well  documented  and  hence  is  not  investigated  here.  The 
nonparametr i c  procedures  have  well  known  robustness  properties  which 
are  mimicked  by  the  entropy  statistics  since  the  latter  are 
constructed  using  ranks  of  a  "trimmed"  data  set.  (A  discussion  of  the 

~  A 

computer  algorithm  generating  H  (d)  and  H  (d)  may  be  found  in  the  next 
chapter.)  Table  3  presents  several  quantile  values  for  the  various 
entropy  statistics  for  the  sample  sizes  50  and  100  obtained  from  the 
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simulations.  The  notation  H  (d8)  refers  to  an  order  8  expansion, 
meaning  that  all  8  bivariate  combinations  of  the  indices  (-1,0,1)  Mere 
included.  H  (d24)  contains  all  24  combinations  of  (-2,-1 ,0, 1 ,2) ,  etc. 
Note  that  in  each  case  the  index  (0,0)  is  excluded  since  the  constant 


n 

Table  3-  Quant 

_ P _ 1 _ Hid) _ 

i les  for 

H  (d8) 

Entropy  Statistics 

H(d24)  H(d48) 

H  (do)  -H  (d) 

50 

0.01 

-0.222 

-0.229 

-0.293 

-6.289 

-0.206 

0.05 

-0.137 

-0.179 

-0.226 

-5.294 

-0.163 

0.10 

-0.100 

-0. 169 

-0.202 

-4.477 

-0.144 

0.25 

-0.067 

-0.145 

-0.179 

-2.838 

-0.102 

100 

0.01 

-0.251 

-0.169 

-0.197 

-0.248 

-0.029 

0.05 

-0.181 

-0.139 

-0.179 

-0.204 

-0.002 

0.10 

-0.157 

-0.128 

-0. 163 

-0.197 

0.018 

0.25 

-0.132 

-0.107 

-0. 149 

-0.174 

0.048 

term  has  been  incorporated  into  the  integrating  factor. 

Power  studies  were  conducted  for  various  values  of  o  for  the 
normal  sample  and  for  a  sample  (Xj  ,Y  j)  , . . .,  (X  ,Y  )  generated  by  Y-X+C 
where  X  is  a  standard  normal  random  variable  and  C  is  a  Cauchy  (0,1) 
random  variable.  This  model  corresponds  to  a  general  regression  model 
with  Cauchy  errors.  It  may  be  shown  that  Y  is  positively  regression 
dependent  on  X  since  the  conditional  distribution  of  Y  given  X*x  is  a 
Cauchy  with  median  x.  Hence,  only  the  normal  theory  statistic  r  will 
have  assumptions  violated  for  this  case. 

The  results  for  the  power  study  may  be  found  in  Table  4.  For 
n«5 0,  the  entropy  statistics  are  disappointing  in  comparison  with  the 
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Table  I*.  Monte  Carlo  Results  for  Power  Study  of 
Measures  of  Association 


significance  I  eve  I »0. 10 


N  RHO 

r 

0 

n 

T 

n 

H(d) 

H(d8) 

H(d2M 

H  (dl*8) 

50  0.2 

0.1*9 

0.1*6 

0.31 

0.11 

0.22 

0.16 

O.k 

0.90 

0.91 

0.91 

0.58 

0.21 

0.23 

0.09 

0.6 

1 .00 

1 .00 

0.93 

0.71 

0.31 

0.32 

0.15 

100  0.2 

0.67 

0.66 

O.65 

0.37 

0.18 

0.17 

0. 19 

0.1* 

0.98 

0.98 

O.98 

0.53 

0.51 

0.1*8 

0.1*2 

0.6 

Cauchy 

1 .00 

1 .00 

1.00 

0.98 

0.91 

0.90 

0.86 

100 

O.32 

1 .00 

1.00 

0.90 

0.95 

0.96 

0.91* 

For  Cauchy  sample,  the  g.o.f.  statistic  H  (d-Norma I ) -H  (d) 
has  power  0.88. 


correlation  statistics  for  local  alternatives  to  c»0  in  the  normal 
case,  although  H  (d)  is  fairly  competitive.  This  suggests  that  the 
density  estimation  approach  should  not  be  recommended  for  small 
samples,  particularly  using  the  numerical  integration  statistics.  One 
suspects  that  the  problem  of  extrapolation  has  unduly  weakened  the 
effect  of  the  statistics,  while  for  H (d)  no  extrapolation  is 
attempted.  For  n*I00,  the  results  are  more  promising,  and  for  the 
non-normal  case,  the  entropy  statistics  perform  well  compared  to 
Spearman's  Rho  and  Kendall's  Tau,  and,  as  expected,  greatly  surpass 
the  normal  theory  statistic  r.  For  this  study,  we  also  included  the 
goodness-of-f i t  statistic 
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H(dQ)-H(d)  -  -(1/2)  Iog(l-r0  -  H (d)  (k.5-7) 

whose  quantile  values  obtained  for  the  normal  cases  are  also  given  in 
Table  3*  The  power  result  of  .88  suggests  that  this  entropy  based 
statistic  may  be  competitive  with  existing  procedures  for  testing 
bivariate  normality.  However,  further  simulations  are  warranted. 

The  poor  performance  of  the  entropy  statistics  suggests  that  some 
modification  be  employed  to  overcome  the  consistency  and  power 
problems.  Rather  than  employ  a  numerical  Riemann  integral,  an 
alternate  approach  is  to  consider  the  numerical  Lebesgue  integral  for 
orders  8,  21*,  and  1*8.  Recall,  the  raw  entropy  statistics  based  on  the 
nearest  neighbor  estimate  is  a  Lebesgue  integral  w.r.t.  the  empirical 
c.d.f.  To  obtain  a  numerical  approximation  of  the  Lebesgue  integral 
for  the  minimum  information  estimators,  the  estimated  dependence 
density  is  evaluated  at  a  k  by  k  grid  of  points  in  the  unit  square  and 
then  is  treated  as  a  vector  of  dimension  k2.  One  ther,  forms  the 
corresponding  vector  of  d  log(d)  values  and  obtains  a  robust  measure 
of  location  such  as  the  trimean  that  serves  as  a  numerical 
approximation  to  the  Lebesgue  integral.  The  amount  of  calculations 
involved  prohibit  a  large  scale  simulation  of  this  approach,  but 
limited  experience  with  some  of  the  data  sets  considerd  in  Chapter  6 
are  promising  at  least  for  cases  involving  small  correlations.  The 
corresponding  use  of  quantile  techniques  to  analyze  the  vector  of  d 
and  d  log(d)  values  may  aid  in  determining  an  appropriate  order  of 
expansion. 

Without  supporting  theory,  a  genera!  simulation  study  for  a  wide 
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class  of  alternatives  to  bivariate  normality  is  infeasible.  The 
results  of  Vasicek  (1976)  are  promising  but  need  to  be  extended  to  the 
bivariate  case.  For  the  simulations  performed,  the  nearest  neighbor 
density  d  was  computed  using  k(n)«5*  in  practice,  one  might  try  a 
variety  of  values  of  k (n)  to  arrive  at  a  "pleasing"  shape,  and  then 
examine  the  entropy  measures.  Unfortunately,  this  subjective  approach 
cannot  be  incorporated  into  a  simulation  study. 


k. 6  Other  Applications 


4.6. 1  Nonparametr ic  Regression 


Let  (X  ,Y (X  , Y  )  be  a  random  sample  from  a  bivariate 
II  n  n 

distribution  with  c.d.f.  F  ,  p.d.f.  f  ,  and  associated 

A  f  T  A  |  * 

marginal  and  conditional  functions  with  the  usual  notation.  One  often 
attempts  to  discern  a  relationship  between  X  and  Y  in  order  to  predict 
Y  given  a  value  of  X.  An  important  object  in  this  cases  if  the 
regress  ion  function 


(x)  -  E [Y | X«x]  *  J"y  fY|„  (y|x)  dy. 


(4.6.1) 


From  the  definition  of  the  conditional  p.d.f.  one  may  express  (4.6.1) 


r  (x)  ■  h  (x)  /f  X(x) 


(4.6.2) 
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where 


h(x)  «  f° y  fXty(x,y)  dy. 

-00 


(4.6.3) 


Watson  (1964)  and  Nadaraya  (1964)  as  referenced  in  Cheng  and  Taylor 
(1980)  used  representation  (4.6.2)  and  kernel  density  estimation 
results  to  suggest  estimating  r  (x)  by 


r  (x)  ■  h  (x)  /f  (x) 


(4.6.4) 


where  f (x)  is  the  kernel  density  estimate  of  f  (x)  given  by 


f  (x)  «  /”[1/h  (n) ]  K[(x-x')/h  (n)]dFn  (x1) 


[1/nh  (n)  ]  l  K[(x-X.)/h(n)]  (4.6.5) 

J«1  J 


and  h (x)  is  related  to  the  kernel  estimate  by 


h(x)  ■  r y  Cl/h  (n)  ]  KC(x-x’)  /h  (n)  ]  dF  (x’,y) 

-00  n 

n 

-  [l/nh  (n) ]  T  Y  K[(x-X.)/h(n)].  (4.6.6) 

j-1  J 

Rosenblatt  (197')  gives  properties  of  r (x)  and  Cheng  and  Taylor  (1980) 
extend  these  to  a  more  general  case  of  a  k-dimensional  X-vector.  This 
technique  is  completely  general  and  can  be  thought  of  as  taking  a 
weighted  average  of  Y  values  based  on  the  X  observations.  This  is 
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easily  seen  by  expressing  (4.6.4)  as 

r(x)  -  J  K[(x-X.)/h(n)]Y  f\  K[  (x-X.) /h  (n)  ]  .  (4.6.7) 

j=l  J  j-1  J 

The  kernel  K  acts  as  a  focusing  function  giving  more  weight  to  Y 
values  for  X  in  a  neighborhood  of  x. 

j 

The  representation  (4.6.4)  suggests  a  multitude  of  estimators 
based  on  the  various  nonparametr ic  density  estimation  techniques  of 
Chapter  3-  Asymptotic  properties  may  be  intractable  for  many  of  these 
cases,  however.  Nonetheless,  one  may  seek  to  rewrite  (4.6.4)  to 
permit  application  of  some  of  the  quantile  based  techniques  mentioned 
previously. 

In  Chapter  2  we  observed  that  (4.6.1)  could  be  translated  to  a 
regress  ion  guant i 1 e  function  by  the  formula 

rQ>.)  -  f1  Q  (u  )  d(u  ,u  )  du  ,  (4.6.8) 

XI  *  Q  T  Z  I  Z  Z 

where  d(u  ,u  )  is  the  dependence  density.  This  formula  is  derived 
I  2 

from  (4.6.2)  with  the  transformation  x*Q^(u^)  and  y*Qy(u^) •  One 
obta i ns 

rQX(“l)  "  /„  *VU2*  ^fX,Y(tW  ,<lY(U2))/fX(<lX(ul)^  d<Vu2^ 

■  /Q  Q y^  dy(u2)  du2  (4.6.9) 

which  then  simplfies  to  (4.6.8)  by  virtue  of  the  reciprocal  identity. 
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Estimation  of  (4.6.8)  may  be  obtained  by  numerically  integrating 
products  of  sample  quantile  and  dependence  density  functions.  For 
example,  selecting  a  grid  U2r‘*”^2m  of  ®9ua,’y  spaced  values 
between  0  and  1,  one  obtains  the  Riemann  sum 

„  m 

rQx  (u  j)  -  (i/m)  l  Qy  (u2j)  dfuj.u^)  (4.6.10) 

as  an  estimator  of  the  regression  function.  A  regression  curve  may 

then  be  plotted  for  various  values  of  U| . 

Parzen  (1979a)  emphasizes  the  conditional  quantile  function  in 

approaches  to  nonparametr i c  regression.  He  considers  versions  of 

(4.6.7)  in  the  quantile  domain  with  emphasis  on  smoothing  raw 

regression  function  estimates  based  on  empirical  quantile  functions. 

In  Parzen  (1977).  raw  estimates  of  the  partial  derivative  of  D(U|,u2) 

w.r.t.  Uj  provide  alternatives  to  (4.6.10)  for  estimating  rQx(u^),  but 

this  is  proposed  only  as  a  "quick  and  dirty"  technique.  Our  emphasis 

on  obtaining  smooth  estimates  of  d(u.  ,u)  should  make  (4.6.10)  the 

1  2 

preferred  estimator  of  the  regression  quantile  function,  but  in  any 
case,  asymptotic  properties  remain  to  be  investigated. 

We  have  only  examined  nonparametr ic  regression  from  a  density 
estimation  approach.  Huber  ( 1 98 1 )  suggests  robust  least  squares 
procedures,  and  Hajek  and  Sidak  ( 1 967)  consider  some  linear  rank  tests 
for  hypothesis  testing  concerning  linear  regression  coefficients.  We 
have  avoided  any  assumption  of  linearity  in  our  discussion,  but  when 
such  an  assumption  is  justified,  nonparametr i c  approaches  to  the 
linear  regression  problem  may  be  preferred.  In  a  pure  modeling 
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approach,  one  would  desire  some  sort  of  residual  analysis  that  tests 
for  white  noise  of  residuals  in  an  effort  to  evaluate  the  model  given 
by  (A. 6. 10) . 

A. 6. 2  Discrimination  and  Classification 

Often  one  seeks  to  classify  an  individual  with  bivariate 
characteristics  (X,Y)  into  one  of  two  parent  populations.  If  data  is 
available  on  each  population,  one  seeks  to  classify  (X,Y)  into  the 
population  that  seems  most  closely  related  to  (X,Y) .  If  the  bivariate 
p.d.f.'s  f  j  (x , y)  and  f2(x,y)  of  populations  1  and  2,  respectively,  are 
known,  one  forms  the  likelihood  ratio 

X  (X,  Y)  -  f  (X,Y)/f2(X,Y)  (A.  6.  11) 

and  classifies  (X.Y)  into  population  1  if  \{X,Y)21  and  into  population 
2  otherwise.  If  the  two  populations  have  normal  distributions  with 
common  unknown  covariance  matrix  E  and  different  unknown  mean  vectors 
u_j  and  u„,  one  obtains  the  corresponding  sample  estimates  of  these 
quantities  and  forms  the  sample  discriminant  function 

W  -  [X  -  (1/2)  (X,  +  X^)]'  S'1  [X,  -  X2]  (A. 6. 12) 

where  X'»  (X.Y),  Xj  ■  (X|,Yj),  etc.  One  then  assigns  (X,Y)  to 
population  I  if  W>0  and  to  population  2  otherwise.  Morrison  ( 1 976) 
gives  an  adequate  description  of  this  normal  theoretic  approach. 
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If  the  assumption  of  bivariate  normality  cannot  be  justified,  one 
may  seek  to  estimate  the  unknown  densities  of  each  population  to  form 


X  -  f  ,  (X,Y)/f2(X.Y) 


(4.6.13) 


and  proceed  as  before.  An  alternate  approach  is  suggested  working  in 
the  quantile  domain.  For  a  sample  from  population  1,  one  estimates 
d(U] ,u2)  and  univariate  c.d.f.'s  and  Fy.  For  population  2, 
estimates  of  d^u^.u^),  G^,  and  are  obtained.  One  then  forms 


X  -  dj(F)((X)  ,Fy(Y))/d2(Gx(X)  ,Gy(Y); 


(4.6.14) 


and  proceeds  accordingly.  Observe,  if  in  sample  1, 

X  SXSX^+jj,  F  ^(X)  ■  k/(n+l)  is  an  acceptable  raw  estimate  for 
U.-F  (X) .  One  may  prefer  using  estimates  of  the  bivariate 

I  A 

dens i ty-quant i le  function  over  using  estimates  of  the  dependence 
density.  Such  an  approach,  however,  seems  to  assume  equal  marginals 
for  both  populations.  While  probabilities  of  mi  sc  lass i f i cat  ion  based 
on  (4.6.14)  may  seem  difficult  to  obtain,  this  approach  "exhausts  the 
data"  by  utilizing  all  of  the  relevant  sample  functions  (if  only 
indirectly)  in  creating  the  discriminant  function.  Thus,  the  approach 
would  seem  more  sensitive  than  an  approach  dealing  only  with 
likelihood  ratios  in  the  density  domain,  and  hence  one  might  expect 
small  probabilities  of  mi  sc  1  ass i f i cat  ion  using  this  technique.  This 
remains  an  open  research  question. 
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4.6.3  Parametric  Modeling 

ft  useful  extension  of  the  minimum  information  density  estimation 
technique  concerns  estimating  parameters  of  a  parametric  model  for  the 
purpose  of  ascertaining  the  adequacy  of  the  model.  Consider  the 
canonical  exponential  model  of  order  m  given  by 

m 

log  f  (x)  ■  1  “  C  (0  j  .  • . .  10^)  .  (4.6.15) 

k=l 

where  T, (x),...,T  (x)  are  called  sufficient  statistics.  This  density 
I  m 

maximizes  entropy  subject  to  the  constraints 

/“  (x)  f  (x)  dx  ■  t^,  k»l . m,  (4.6.16) 

-CD 

where  Tj,...,Tk  are  called  moment  parameters.  This  implies  that  the 
normal  distribution  maximizes  entropy  over  all  other  distributions 
with  specified  mean  and  variance.  Using  the  minimum  information 
approach  for  the  parametric  model  (4.6.15)*  one  obtains  least  squares 
estimates  for  the  model  parameters  which  can  lead  to  estimates  of  the 
moment  parameters.  Recall,  from  the  theory  of  exponential  models,  one 
has 


(3/30^  C(0J,...,  0^  -  Tk, 


(4.6.17) 


and  hence  under  suitable  regularity  conditions  the  moment  estimators 


>30 


»  n 

*  (1/n)  It.  (X  -)  ,  k«l,..,ffl.  (4.  6.18) 

*  j-|  K  J 

are  also  maximum  likelihood  estimators  and  thus  the  estimators 

3k  "  ek(T, . ,  k«l . m,  (4.6.19) 

are  m.I.e.'s  by  the  invariance  property.  Hence,  one  may  form  the 
least  squares  estimates  and  compare  these  to  the  m.I.e.'s  for  a 
diagnostic  check  of  the  adequacy  of  the  parametric  model  given  by 

(4.6. 15) • 

Example  4.6.1  For  the  normal  case,  Tj(x)*x  and  T^xJ-x1,  so  that 
Tj»p  and  t2«u *+o*.  The  canonical  parameters  are  given  by  9j»(u/cra) 
and  62“-l/(2oa).  The  regression  approach  would  form  estimators  of  the 
parameters  in  the  model 

log  f (x)  -  eQ  +  SjX  +  92x»  (4.6.20) 

with  a  stochastic  element  introduced  when  the  nearest  neighbor 
estimate  replaces  f (x)  in  the  model. 

Example  4.6.2  For  a  gamma  model  with  dens  ity 

f  (x)  -  [ba/r(a)]  xa''exp(-bx),  (4.6.21) 

where  x>0,  a,b>0,  the  canonical  form  is 
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log  f  (x)  ■  +  9 1 log  x  +  e^*'  (k.6.22) 

where  (x)«log  x  and  (x)  **x  with  0^*(a-l)  and  02“"b*  For  a 
location-scale  gamma  model,  problems  are  encountered  unless  the 
location  parameter  is  known.  For  unknown  location,  one  estimates  it 
by  the  minimum  value  of  the  data  and  then  treats  this  as  the  true 
parameter  value  to  be  able  to  obtain  the  least  squares  estimates  for  a 
and  b. 

This  parametric  modeling  approach  will  be  illustrated  in  Chapter 
6  applied  to  normal  and  gamma  data.  Note  that  this  technique  is  not 
recommended  for  parameter  estimation  when  the  model  is  known  to  be 
valid,  but  instead  is  suggested  as  a  method  for  checking  the  adequacy 
of  a  model.  One  may  wish  to  investigate  distributional  properties  to 
suggest  inferential  goodness-of-f i t  procedures.  Bivariate  extensions 
are  fairly  straightforward  and  will  not  be  considered  here. 

k.7  Concluding  Remarks 

Perhaps  the  greatest  weakness  of  nonparametr i c  statistics  until 
now  has  been  its  failure  to  adequately  handle  multivariate  problems. 
The  problem  seems  to  center  around  the  insistence  upon  carrying  out 
inferential  procedures  on  parameters  of  a  probability  model  and  the 
inability  to  nonparametr i cal ly  estimate  these  parameters.  For 
example,  the  contrasts  of  interest  in  an  analysis  of  variance  setting 
often  rely  on  robustness  properties  in  the  absence  of  nonparametr ic 
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multiple  comparison  procedures.  Heuristic  solutions  (such  as 
replacing  the  data  by  ranks)  may  appear  to  work  in  some  cases,  but 
further  study  is  warranted. 

We  have  followed  the  recent  approach  of  placing  function 
approximation  ahead  of  parameter  estimation.  Naturally  this  is  a  more 
difficult  estimation  problem,  but  once  solved,  it  leads  directly  to  a 
solution  to  estimating  parameters  of  interest.  Unfortunately,  most 
techniques  must  reside  in  the  category  of  the  exploratory  rather  than 
confirmatory  because  of  the  lack  of  theory  to  support  the  procedure. 
The  acceptance  of  Monte  Carlo  studies  has  gradually  improved  over  the 
years,  but  unfortunately,  function  approximation  does  not  lend  itself 
to  general  Monte  Carlo  experimental  designs  especially  when  seeking 
comparisons  for  a  wide  class  of  alternatives.  Many  of  the  expansion 
techniques  have  adequate  theoretical  motivation,  the  main  problem 
being  that  of  order  determination.  Consequently,  techniques  such  as 
the  autoregressive  approach  of  Parzen  or  the  orthogonal  expansion 
technique  of  Kronmal  and  Tartar  that  exhibit  objective  order 
determining  criterion  are  the  most  promising  of  those  suggested  in  the 
literature.  This  motivated  the  expansion  techniques  considered  in 
this  chapter.  Unfortunately,  the  criteria  employed  did  not  seem  to 
perform  as  well  as  hoped,  but  once  a  suitable  criterion  is  obtained, 
the  minimum  information  technique  will  become  even  more  useful  than 
approaches  such  as  the  kernel  method  that  necessitate  examining 
multitudes  of  shapes  to  arrive  at  a  conclusion.  Furthermore,  the 
generalization  of  methods  based  on  the  dependence  density  to  a 
multivariate  setting  are  fairly  straightforward  especially  in  the 
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nonparametr ic  regression  framework.  The  sample  size  required  in  a 
multivariate  setting  to  adequately  perform  function  approximation  will 
always  pose  a  serious  problem,  however,  and  hence  parametric 
approaches  will  continue  to  dominate  small  sample  settings  when  one 
can  justify  the  assumed  model  to  any  degree  of  satisfaction. 

I 

I 

I 
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5.  COMPUTER  SOFTWARE  FOR  BIVARIATE  DATA  ANALYSIS 


13<* 


5-1  Introduction 

The  major  statistical  computer  packages  have  yet  to  fully  enter 
the  fields  of  nonparametr i c  density  estimation  or  bivariate  data 
analysis.  Consequently,  one  must  create  his  own  computer  programs  to 
carry  out  many  of  the  procedures  detailed  in  this  work.  For 
nonparametr ic  density  estimation,  most  statistical  packages  will  have 
histogram  procedures,  but  only  IMSL  (International  Mathematical  and 
Statistical  Libraries)  provides  routines  to  do  other  types  of 
nonparametr i c  density  estimation.  When  one  ventures  too  far  from 
classical  normal  theory  procedures  or  some  of  the  more  popular 
nonparametric  techniques,  the  existing  statistical  software  packages 
are  of  little  help. 

Ideally,  the  examination  of  density  curves  is  carried  out  in  an 
interactive  computing  environment  so  that  shapes  can  be  examined  and 
adjusted  quickly  to  arrive  at  an  "optimal"  choice  for  the  estimated 
density.  However,  the  programs  we  will  discuss  were  written  in 
FORTRAN  for  batch  processing.  This  was  done  for  a  variety  of  reasons 
which  will  not  be  discussed  here.  The  translation  of  FORTRAN  code 
into  an  interactive  language  such  as  BASIC  is  not  too  difficult,  and 
some  systems  have  time  sharing  FORTRAN  capabilities.  The  system 
employed  for  our  procedures  has  a  "simulated"  interactive  language 
that  permits  quick  access  to  batch  output  at  a  CRT  terminal.  The 
computing  environment  for  program  implementation  will  be  discussed 
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later . 

The  writing  of  computer  programs  is  considered  to  be  an  art  by 
6ome,  and  a  particular  program  often  mirrors  something  of  the 
personality  of  its  creator.  Thus,  examination  of  the  code  that  we 
have  written  illustrates  a  certain  philosophy  of  programming  that  will 
be  discussed  in  the  next  section.  The  actual  programs  that  we  have 
written  will  be  discussed  in  sections  5-3  and  5.1*.  We  will  conclude 
this  chapter  by  examining  the  facilities  that  were  used  and  the 
typical  effort  required  to  execute  a  program  and  retrieve  the  results. 

5.2  A  Philosophy  of  Statistical  Computing 

There  are  many  ways  to  attack  the  writing  of  computer  code  to 
carry  out  some  desired  purpose.  The  recent  popularity  and  utility  of 
structured  programming  has  caused  it  to  be  a  widely  practiced  form  of 
program  construction.  The  idea  behind  this  approach  is  to  carefully 
organize  a  program  so  that  it  flows  smoothly  from  one  computation  to 
the  next  without  haphazard  placement  of  loops  and  branches.  There  are 
a  variety  of  ways  to  organize  a  program  with  this  approach  in  mind. 

One  method  is  to  create  a  bank  of  subroutines  each  of  which  is 
carefully  designed  to  carry  out  a  specific  task,  and  then  write  a 
fairly  terse  main  program  that  systematically  accesses  these  routines. 
Using  this  approach,  one  may  discover  that  efficient  routines  already 
exist  that  perform  certain  tasks,  and  hence  one  need  not  expend  effort 
in  creating  the  routine  oneself.  The  IMSL  FORTRAN  subroutine  library 
contains  many  useful  techniques  backed  by  extensive  testing  that  could 
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probably  not  be  matched  by  a  programmer  with  limited  resources.  Many 
systems  maintain  a  variety  of  subroutine  libraries  and  >  italogued 
procedures  that  may  be  useful  to  programmers.  With  a  collection  of 
tested  subroutines  at  ones  disposal,  the  trauma  of  debugging  a  large 
program  is  greatly  reduced.  With  this  in  mind,  one  may  wish  to  insert 
checks  and  flags  in  a  routine  to  guard  against  its  misuse  in  later 
applications.  An  alternate  philosophy  adopted  by  some  is  to  write 
completely  self  contained  main  programs  that  systematically  perform 
every  task  in  the  main  body  of  the  program.  Careful  documentation  of 
such  programs  make  it  easy  for  one  to  examine  the  code  to  discover 
what  tasks  the  program  is  performing.  Arguments  do  not  have  to  be 
passed  back  and  forth  to  subroutines  and  array  dimensioning  is  handled 
only  once  without  the  need  to  trace  dimension  values  throughout  a 
program.  While  this  approach  has  some  advantages,  the  major  weakness 
is  that  a  great  deal  of  repitition  may  occur  in  writing  the  program  so 
that  less  effort  may  be  spent  thinking  of  new  approaches  or  ways  of 
making  the  program  more  efficient.  We  favor  the  subroutine  approach 
as  one  will  soon  realize  upon  examining  our  programs. 

There  are  three  goals  inherant  in  computer  program  construction. 

1)  The  program  should  correctly  perform  the  task  for  which  it  was 
i ntended . 

2)  The  program  should  work,  i.e.,  it  should  be  reliable,  anticipating 
any  awkward  contingencies  that  might  occur. 

3)  The  program  should  be  easy  to  read  and  easy  to  maintain. 
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A  fourth  goal  is  often  added  to  this  list. 

U)  The  program  should  be  portab le.  i.e.,  it  should  be  designed  to  work 
in  a  general  computing  environment  for  a  variety  of  computer 
systems . 

Using  a  popular  language  such  as  FORTRAN  or  COBOL  and  avoiding  machine 
dependent  conventions  should  promote  portability,  although  most  likely 
some  translation  will  be  needed  when  going  from  one  system  to  another. 
Since  we  anticipate  that  our  programs  may  be  used  at  more  than  one 
computer  installation,  we  have  made  some  attempt  to  avoid  machine 
dependent  conventions. 

The  goals  presented  above  are  important,  but  as  with  the  theory 
vs.  simulation  dilemma,  one  can  never  anticipate  the  infinite 
possibility  of  data  sets  that  may  be  exposed  to  a  program.  Therefore, 
anticipating  such  problems  as  division  by  zero  may  promote  the 
efficient  use  of  a  program,  especially  if  an  error  might  occur  in  a 
minor  step  not  crucial  to  the  general  task  of  the  program.  Insertion 
of  options  that  default  to  logical  values  will  also  help  insure  that  a 
program  completes  its  task  despite  minor  errors  of  no  consequence. 

A  wide  variety  of  computer  languages  exist  each  aimed  at 
empasizing  a  particular  application.  FORTRAN  is  designed  for 
scientific  computations  while  COBOL  is  geared  more  towards  business 
applications.  Thus  COBOL  may  be  better  suited  for  character 
mart  i pul  at  ion  while  FORTRAN  might  be  preferred  for  "number  crunching". 
Since  our  main  goal  Is  one  of  computation,  we  have  chosen  to  write  our 


programs  in  FORTRAN.  Also,  as  suggested,  a  FORTRAN  program  may 
written  to  be  fairly  portable.  The  only  problems  might  occur  in 
certain  format  conventions  or  specialized  functions  such  as  array 
manipulation.  Furthermore,  the  popularity  of  FORTRAN  will  make  the 
code  understandable  to  most  users  in  case  they  wish  to  make 
modifications  tailored  to  their  own  specialized  applications. 

Although  many  options  are  included  in  our  programs,  a  user  may  wish  to 
add  or  delete  options  to  reflect  the  specialized  environment  in  which 
he  is  working.  The  FORTRAN  language  has  many  qualities  to  recommend 
it  which  need  not  be  discussed  here. 

Finally,  we  note  that  while  efficiency  is  always  an  important 
concern  especially  in  the  construction  of  a  large  program,  one  may  be 
more  concerned  with  documenting  and  organizing  a  program  so  that  it 
may  easily  be  used  by  others.  Naturally,  one  seeks  a  suitable 
compromise  between  efficiency  and  ease  of  use  so  that  a  program  is  not 
prohibitively  expensive  or  time  consuming.  Our  first  concern  is  for 
accuracy  and  precision.  When  these  attributes  are  sufficiently 
safeguarded,  then  one  may  search  for  ways  to  make  a  program  more 
efficient.  Clearly  one  does  not  desire  a  program  that  quickly  and 
efficiently  computes  the  wrong  answer,  although  this  is  a  common 
occurence  in  computer  applications.  More  insight  may  be  gained  into 
this  philosophy  by  a  closer  inspection  of  the  routines  we  have 
written.  Some  comments  will  be  made  about  obstacles  that  had  to  be 
overcome,  and  references  will  be  made  to  the  authors  of  contributing 
routines.  For  a  discussion  of  statistical  computing  one  may  consult 
Kennedy  and  Gentle  (1980) ,  and  for  a  general  discussion  of  the  science 
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and  art  of  computing,  Knuth  0968)  is  a  useful  reference. 

5-3  Univariate  Density  Estimation  Routines 

The  popularity  of  the  histogram  makes  it  readily  available  from 
many  statistical  computer  packages.  The  Statistical  Analysis  System 
(SAS,  1979)  provides  histograms  in  its  CHART  procedure.  The  BMDP 
Biomedical  Computer  Programs  (BMDP,  1979)  provide  two  programs,  BMDP2D 
and  BMDP5D,  that  produce  histograms  for  a  data  set.  MINITAB  (Ryan, 
Joiner,  and  Ryan,  1975)  has  a  command  HISTOGRAM  that  will  produce  a 
histogram  for  a  specified  data  vector.  The  Statistical  Package  for 
the  Social  Sciences  (SPSS,  see  Hie,  et  aK ,  1975)  provides  a  histogram 
through  the  procedure  FREQUENCIES.  All  of  these  routine  have  an 
objective  default  for  computing  cell  widths  and  boundaries.  For 
example,  PROC  CHART  of  SAS  by  default  will  let  m*FLOOR[l+3.3  )og(n)j 
where  FLOOR  is  the  greatest  integer  function  and  n  is  the  sample  size. 
The  range  of  the  data  Is  then  divided  into  m  equally  spaced  intervals 
yielding  h»range/m. 

The  histogram  is  the  only  form  of  density  estimation  available 
from  most  packages.  IMSL  has  two  routines  that  offer  alternatives  to 
the  histogram,  but  it  is  the  only  major  source  of  such  routines. 

Kernel  estimation  is  performed  by  the  IMSL  routine  NDKER  for  user 
provided  kernel  and  specified  window  width.  The  routine  NDMPLE 
performs  discrete  maximum  penalized  likelihood  density  estimation  for 
user  specified  smoothing  parameter.  These  routines  are  well 
documented  and  easy  to  use. 
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For  alternatives  to  the  existing  software,  one  may  consult  the 
literature  to  obtain  algorithms  to  be  programmed.  The  moment 
techniques  of  Cencov  (1962)  are  easily  programmed  once  one  has 
surmounted  the  problem  of  generating  orthogonal  systems  of  functions 
in  L1  space.  Tartar  and  Kronmai  (1976)  provide  a  very  readable 
account  of  implementing  density  estimation  techniques.  Based  on  these 
references  and  the  use  of  such  numerical  algorithm  sources  as 
Abramowitz  and  Stegun  (1972),  one  may  readily  construct  FORTRAN 
routines  to  perform  density  estimation. 

For  the  minimum  information  techniques  developed  in  Chapter  4, 
one  may  use  existing  regression  software  to  implement  the  procedures. 
However,  for  complex  exponential  systems  some  adjustments  may  have  to 
be  made.  The  TIMESBOARD  FORTRAN  library  of  Newton  (1979)  contains 
some  useful  routines  for  handling  complex  regression  in  addition  to 
many  general  purpose  routines.  An  alternative  to  obtaining  FORTRAN 
regression  routines  to  implement  this  procedure  is  to  create  a 
"regression  data  set"  and  use  this  as  input  to  a  procedure  such  as 
PROC  GLM  of  SAS.  Other  regression  software  may  also  be  employed. 

The  author  has  written  five  FORTRAN  routines  to  perform 
univariate  nonparametr i c  density  estimation.  These  routines  along 
with  the  I MSL  routine  NOKER  will  be  applied  to  several  data  sets  in 
the  next  chapter  to  illustrate  their  use.  The  five  routines  we  have 
written  are  called  NNDEN  (Nearest  Neighbor),  KTDEN  (Kronmai -Tartar 
type  with  trigonometric  polynomials),  TKDEN  (Tartar-Kronmal  type  with 
complex  exponentials),  M I  DEN  (Minimum  information  type  with  Legendre 
polynomials),  and  CMPDEN  (minimum  information  type  with  CoMPlex 
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exponentials).  The  routines  NNDEN,  KTDEN,  and  TKDEN  are  easily 
written  following  the  algorithms  described  in  the  literature  and  hence 
will  not  be  listed  here.  The  routines  WIDEN  and  CMPDEN  are  listed  in 
Appendix  A.  Appendices  C  and  D  contain  a  collection  of  subprograms 
accessed  by  these  procedures. 

The  density  estimation  routines  were  written  to  accept  standard 
input  and  produce  standard  output  within  the  routine,  passing  as  few 
arguments  back  and  forth  as  possible.  Our  purpose  for  writing  the 
routines  was  to  get  plotted  output  quickly  and  efficiently  for  a 
variety  of  smoothing  parameter  values.  For  more  practical 
applications,  one  may  wish  to  pass  the  actual  estimated  density  values 
back  to  the  main  program  to  be  used  for  further  investigation  or 
analysis.  This  is  easily  accomplished  by  modifying  the  calling 
arguments.  The  only  problem  might  be  in  controlling  the  values  at 
which  the  density  is  evaluated,  but  routines  are  available  for 
interpolation  if  necessary.  For  the  parametric  orthogonal  expansion 
models,  one  need  only  pass  paramter  estimates  with  the  corresponding 
variable  indices  back  to  the  calling  program  to  be  employed  as  needed. 
As  suggested,  this  is  an  advantage  in  using  expansion  techniques, 
namely,  that  one  need  only  knowledge  of  a  few  paramter  estimates  to 
completely  describe  the  estimated  density  rather  than  knowledge  of 
function  estimates  for  a  large  number  of  values.  For  example,  a 
vector  of  size  10  may  store  all  of  the  relevant  information  about  an 
orthogonal  expansion  approximation,  while  vectors  of  size  50  or  more 
are  usually  required  for  nearest  neighbor,  kernel,  or  OMPL  type 
estimates,  depending  upong  the  number  of  estimated  values  one  wishes. 
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Typically,  the  unique  feature  of  each  density  estimation  routine 
is  the  algorithm  employed  to  derive  the  estimate.  Otherwise,  each 
routine  has  a  general  framework.  This  framework  is  outlined  as 
fol lows: 

I nput:  Data  (X),  sample  size  (N) ,  minimum  value  of  data  (A  or  XMIN)  , 
maximum  value  of  data  (B  or  XMAX) ,  and  options  (lOPTk). 

Preprocess  Data:  If  data  is  modified  by  an  algorithm,  let  Y  ( I )  *  X ( I ) 
and  use  Y  vector  in  the  procedure.  If  trimming  or  scaling  is 
required,  perform  necessary  transformations  before  exposing 
data  to  the  algorithm.  (Usually  standardizing  to  an  interval 
(a,b)  is  performed  within  the  algorithm  to  reduce  the  lines  of 
computer  code  required.) 

I nvokc  Algor i thm:  Expose  the  (possibly  transformed)  data  to  the 

algorithm  to  obtain  parameter  estimates  or  estimated  density 
values.  The  standard  approach  is  to  then  obtain  density 
estimates  evaluated  at  N  or  100  equally  spaced  values  between 
XMIN  and  XrtAX. 


Compute  Dens i ty  Funct iona Is:  Two  versions  of  each  routine  exist,  one 
applied  to  data  with  unknown  distriburion  and  one  applied  to 
data  with  a  specified  null  distribution.  For  data  with  unknown 
distributions,  estimates  of  the  mean,  variance,  and  mode  are 
obtained  by  numerical  integration  and  grid  search  techniques 
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performed  on  the  estimated  dens'ty.  For  a  specified  null 
distribution,  estimates  of  integrated  squared  error,  mean 
squared  error,  and  maximum  absolute  deviation  are  obtained  in  a 
similar  fashion.  The  options  allowed  for  null  distributions 
include  normal,  gamma,  and  a  mixture  of  two  normal  densities 
since  these  are  the  cases  considered  in  Chapter  6. 

Plot  Dens i ty  Estimates:  For  unknown  parent  distributions,  a  printer 
plot  is  obtained  displaying  equally  spaced  X  values,  estimated 
density  values,  and  the  corresponding  shape  of  the  estimated 
density.  For  specified  null  distributions,  an  overlay  plot  is 
obtained  with  an  additional  listing  of  the  null  density  values. 
Examples  of  these  plots  will  appear  in  Chapter  6. 

The  options  lOPTk  usually  involve  choices  of  smoothing  parameter 
orders  or  null  density  values.  They  may  also  decide  the  number  of 
different  estimates  to  be  obtained.  Output  of  plots  and  parameter 
estimates  is  automatic,  but  the  procedures  may  be  modified  by  the  user 
to  control  output. 

For  the  NNDEN  procedure,  if  one  wishes  to  obtain  density 
estimates  at  m  points,  roughly  i*mn+kn+6  computations  will  be  performed 
for  a  data  set  of  size  n  and  smoothing  paramater  k«k  (n) .  The  most 
involved  step  is  finding  the  smallest  radius  r  such  that  a  sphere  with 
radius  r  centered  at  the  evaluated  data  point  contains  k-1  additional 
points.  To  perform  this  computation,  (n-1)  radii  are  computed  and  k 
calls  to  routine  MIN  are  made  with  Y(IMIN)  being  replaced  by  a  large 
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positive  value  each  time.  It  is  conceivable  but  unlikely  that  all 
radii  will  be  larger  than  this  value,  which  is  currently  set  at 
7.0E+75*  This  is  near  the  machine  limit  for  single  precision 
constants  and  represents  7  followed  by  75  zeros.  Hence,  it  should  be 
suitable  for  most  data  sets.  In  the  univariate  case,  increasing  the 
value  of  k  does  not  severely  lengthen  the  procedure,  but  the 
multivariate  analog  can  be  unduly  lengthened  by  large  choices  of  k  due 
to  the  involved  computations  for  the  volume  of  a  hypersphere. 

The  algorithms  employed  for  the  KTOEN  and  TKDEN  procedures  are 
perhaps  the  easiest  to  program.  The  most  difficult  step  is  generating 
a  system  of  orthogonal  functions.  Since  KTDEN  and  TKDEN  compute 
moment  estimators  based  on  orthogonal  systems  of  with  pleasing  forms, 
the  algorithms  for  generating  density  estimates  are  easily  programmed. 
The  trigonometric  polynomials  or  complex  exponentials  may  be 
programmed  directly  into  the  averaging  routine  without  problems  of 
generating  coefficients  as  is  the  case  if  Legendre  [-1,1]  polynomials 
had  been  used.  The  data  must  be  standardized  to  (0,1),  but  this  is 
easily  implemented  directly  into  the  algorithms  using  the 
transformation  (Y  (I) -A) /  (B-A)  where  A«min(X)  and  B-max  (X)  and  Y  is  the 
vector  of  X  values  described  above.  Estimates  are  then  computed  using 
a  truncated  o^der  MN-m(n),  with  the  maximum  order  currently  set  at  10. 
The  vector  THETA  of  moment  coeficients  computed  for  the  orthogonal 
expansion  need  only  be  computed  once  up  to  the  maximum  order  by  virtue 
of  the  orthogonality  property.  Estimates  of  varying  orders  may  then 
be  obtained  by  simply  calling  for  the  coefficients  needed.  The 
routine  TKDEN  computes  a  best  order  using  the  MISE  criterion,  while 


KTDEN  merely  displays  plots  for  user  specified  orders. 

The  routine  KRDEN  merely  generates  the  input  parameters  for  the 
IMSL  routine  NDKER  which  is  documented  in  the  I  MS  L  Library,  Volume  2 
(1980).  Various  values  of  H-h  (n)  may  be  employed  to  obtain  different 
shapes  for  the  estimated  density.  The  kernel  employed  is  the  normal 
kernel  programmed  in  the  function  subprogram  XNKER. 

The  routines  MIDEN  and  CMPDEN  are  the  most  complicated  routines 
in  that  a  variety  of  regression  subprograms  must  be  employed  to 
generate  the  parameters  of  interest.  MIDEN  uses  a  subprogram  called 
LEGP  to  generate  the  matrix  COF  of  Legendre  polynomial  coefficients. 
CMPDEN  has  the  advantage  shared  by  KTDEN  and  TKDEN  of  employing  the 
orthogonal  functions  directly  in  the  algorithm.  The  difference 
between  these  techniques  and  the  above  orthogonal  expansion  techniques 
is  that  a  covariance  matrix  COV  (PHI  in  CMPDEN)  must  be  computed  and 
supplied  to  a  least  squares  algorithm  to  obtain  least  squares 
parameter  estimates  for  the  truncated  orthogonal  expansion.  A  SWEEP 
operator  is  employed  in  the  sequential  regression  routine  SEQREG  to 
obtain  the  coefficients  in  MIDEN,  while  a  complex  SWEEP  operator  is 
used  in  CSQREG  for  CMPDEN.  The  SWEEP  operator  has  many  computational 
advantages.  Kennedy  and  Gentle  < 1 980)  discuss  some  of  its  properties. 

Both  MIDEN  and  CMPDEN  use  the  logarithm  of  an  initial  density 
estimate  from  NNDEN  to  serve  as  a  dependent  variable  in  the  regression 
framework.  A  value  of  8  i s  used  to  m(n)  as  a  default,  but  was 
modified  for  some  of  the  runs  described  in  Chapter  6.  For  comparison 
purposes,  a  version  of  CMPDEN  was  written  that  used  NDKER  in  place  of 
NNDEN  to  obtain  initial  estimates,  and  while  results  were  excellent 
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for  the  special  case  considered,  some  reasons  will  be  given  in  Chapter 
6  why  this  practice  is  not  recommended.  Note  that  the  maximum  order 
of  expansion  is  set  at  10  for  both  procedures  and  that  orthogonality 
with  respect  to  the  initial  estimate  has  not  been  induced.  This  means 
that  addition  of  terms  in  the  expansion  will  change  a  1 1  of  the 
previously  computed  coefficients.  Consequently,  one  may  wish  to 
modify  this  procedure  accordingly,  but  one  will  note  in  the  next 
chapter  that  non-orthogonality  does  not  pose  a  serious  problem. 

In  programming  the  complex  regression  technique  for  CMPDEN, 
several  early  failures  emphasize  the  importance  of  taking  care  when 
dealing  with  complex  values.  Although  the  final  density  estimate  will 
be  real  if  one  employs  conjugate  pairs  in  the  expansion,  the  erroneous 
deletion  of  the  imaginary  part  of  the  estimated  coefficients 
invalidates  the  obtained  estimate.  One  should  avoid  the  suppression 
of  imaginary  terms  to  obtain  real  valued  estimates  until  the  procedure 
has  been  thoroughly  tested.  Examinations  of  the  results  of  Chapter  6 
reveal  that  the  technique  for  obtaining  estimates  in  both  TKDEN  and 
CMPDEN  produces  complex  valued  coefficients.  When  applied  properly  in 
conjugate  pairs,  the  imaginary  terms  always  vanish. 

The  problem  of  order  determination  has  been  considered  in  both 
TKDEN  and  CMPDEN.  As  mentioned,  the  MISE  criterion  is  used  to  obtain 
a  "best"  order  for  the  expansion  in  TKDEN.  The  AIC  criterion  is 
computed  in  CMPDEN  to  diagnose  a  best  order,  but  in  practice  this 
criterion  does  not  seem  to  perform  as  well  as  the  MISE  criterion. 

A  parametric  version  of  Ml  DEN  called  NPDEN  was  created 
specifically  to  handle  least  squares  estimation  of  the  parameters  in 


the  canonical  exponential  model  representation  of  gamma  and  normal 
densities.  The  independent  variables  are  log  x,  x,  and  x*  in  the 
expansion  and  the  coefficients  are  computed  by  inserting  the 
appropriate  variables  into  the  regression  model. 

The  utility  of  the  minimum  information  regression  approach  is 
illustrated  by  the  two  step  FORTRAN-SAS  program  listed  in  Appendix 
A. 3-  A  FORTRAN  PROGRAM  uses  LEGP  to  generate  independent  variables 
and  NNDEN  to  generate  the  dependent  variable  to  be  used  in  the  SAS 
procedure  GLM.  Predicted  values  are  written  into  a  data  set  called 
TWO,  and  in  a  DATA  step  the  exponent  of  the  predicted  values  is 
obtained  and  submitted  to  the  procedure  PLOT  where  the  estimated 
density  is  then  plotted  against  the  original  X  value.  This  two  step 
procedure  exemplifies  the  ease  in  adapting  the  minimum  information 
regression  approach  to  existing  regression  software.  More  difficulty 
may  be  encountered  in  the  use  of  complex  exponentials  because  complex 
regression  routines  are  not  to  be  found  in  the  major  statistical 
packages. 

Appendix  C  contains  some  useful  subprograms  accessed  by  the  above 
routines.  Appendix  0  contains  several  of  the  plotting  subprograms 
employed  to  obtain  printer  plots  of  the  estimated  densities.  PLOTXY 
generates  45  equally  spaced  X  values  and  generates  the  corresponding  Y 
values  by  linear  interpolation.  Then  a  plot  of  Y  as  a  function  of  X 
is  produced  with  the  values  of  X  and  Y  printed  for  each  plotted  point. 
Correspond i ng  to  this  routine  is  PLTXYZ  which  produces  an  overlay  plot 
of  Y  and  Z  as  functions  of  X.  The  routine  PPLOT  produces  a  more 
appealing  plot  but  presents  only  a  scale  of  X  and  Y  values.  PPLOT  is 


recommended  for  scatter  plots  as  it  does  not  induce  linearity  by 
interpolating  values  as  does  PLOTXY.  For  more  elaborate  plotting 
using  the  Versatec  plotting  software,  the  subprogram  GPLOT  of  the 
TIMESBOARD  Library  (Newton,  1279)  is  a  flexible  multiple  purpose 
routine  for  obtaining  high  resolution  plots.  SAS/GRAPH  ( 1 98 1 )  also 
contains  many  useful  plotting  procedures.  A  version  of  NPDEN  accesses 
GPLOT  to  obtain  plots  on  the  Versatec  Electrostatic  Plotter.  Some  of 
the  three  dimensional  plotting  procedures  of  SAS/GRAPH  will  be 
mentioned  in  the  next  section. 

5.4  The  BISAM  Bivariate  Data  Modeling  Program 

The  program  BISAM  is  a  general  purpose  bivariate  data  modeling 
program  designed  to  provide  a  variety  of  univariate  and  bivariate 
descriptive  statistics  and  graphical  output.  The  core  of  BISAM  is  the 
routine  CMP  INF  (CoMPlex  regression  using  INFormation  functionals) 
which  serves  as  a  generalization  to  CMPDEN  for  the  bivariate  case. 
Within  CMPINF  the  rank  transformed  data  is  exposed  to  a  bivariate 
version  of  NNDEN  with  computations  proceeding  as  in  CMPDEN.  A  modular 
arithmetic  scheme  is  devised  to  imbed  the  two-dimensional  subscripting 
of  coefficients  into  a  one  dimensional  indexing  scheme.  A  maximum 
order  of  7  is  specified  and  a  49  by  49  covariance  matrix  (omitting  the 
constant  term)  is  then  supplied  to  CSQREG  described  above.  Currently, 
orders  8,  24,  and  48  are  automatically  supplied  with  the  user  having 
the  option  to  override  these  values  with  user-supplied  even  orders  (to 
insure  that  the  estimated  dependence  density  is  real).  A  listing  of 


the  main  body  of  8ISAM  along  with  the  major  subprograms  employed  may 
be  found  in  Appendix  B.  The  minor  and  peripheral  subprograms  employed 
may  be  found  in  appendices  C  and  0. 

The  main  calling  program  of  B I  SAM  reads  the  bivariate  data  set 
with  the  variables  read  in  one  at  a  time  as  univariate  data  sets.  The 
first  card  of  a  data  set  should  contain  a  descriptive  title  describing 
the  data  set.  The  second  card  should  contain  the  number  of  data 
points  N  and  the  format  of  the  input  variables.  The  format  that  reads 
the  second  card  is  given  by  (15. **X , 5 A4)  .  Coded  in  the  appropriate 
format,  the  N  data  points  should  then  follow.  When  BISAM  finishes 
reading  the  two  data  sets,  it  immediately  checks  to  see  if  the  two 
sample  sizes  agree.  If  they  do  not,  the  program  terminates.  Before 
reading  the  two  data  sets,  however,  the  user  must  specify  several 
options  to  be  employed  in  the  analysis.  Hence,  the  first  data  card 
will  be  an  option  card  with  the  following  input  options  in  (915) 
format: 

NT APE  -  tape  where  data  set  resides 

MORD  -  maximum  autoregressive  order  to  be  used  for 

univariate  density  estimation  (£6) 

IDQX.IDQY  -  null  distributions  for  autoregressive  smoothing: 

1  *  Normal  4  -  Double  Exponential 

2  •  Exponential  5  ■  Uniform  reciprocal 

6  ■  Cauchy 


3  *  logistic 
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A  value  of  1  is  recommended.  For  a  more  complete 
listing  and  description,  see  Parzen  and  Anderson 
(1980)  . 

IPLT1  -  Scatter  plot  options: 

0  *  no  scatter  plots 

1  =  scatter  plot  of  data 

2  *  scatter  plot  of  rank  transformed  data 

3  ■  both  scatter  plots 

IPLT2  -  Univariate  density  plotting  options: 

0  *  no  quantile  box  plots 
1  *  produce  quantile  box  plots 

IDST  -  Univariate  descriptive  statistics: 

0  -  no  descriptive  statistics  displayed 
1  -  descriptive  statistics  computed  and  displayed 
for  each  variable 

KDEL  -  Maximum  number  of  extreme  points  to  exclude  from 

analysis. 

Extreme  points  are  determined  by  distance  from  the 
median,  and  if  X  and  Y  extreme  points  correspond, 
they  count  as  two  points  although  only  one  will  be 
excluded  from  the  analysis.  Hence,  KDEL  usually 
exceeds  the  actual  number  of  points  so  deleted. 


These  options  illustrate  the  variety  of  output  available  from 
BISAM.  The  univariate  descriptive  statistics,  plots,  and 
autoregressive  density  estimates  are  obtained  by  employing  some 
modified  routines  from  ONESAM  (Parzen  and  Anderson,  1 980)  .  The 
routine  PPLOT  describe'  in  the  last  section  is  used  to  obtain  the 
scatter  plots,  while  PLOTXY  is  used  to  plot  the  univariate 
density-quantile  functions. 

Before  calling  CMPINF,  several  routines  are  accessed  that  provide 
some  of  the  standard  correlation  statistics  discussed  in  section  6.3. 
CMPINF  then  uses  the  various  estimates  of  the  dependence  density  to 
obtain  entropy  measures  of  association.  These  are  displayed  along 
with  the  standard  correlation  statistics  at  the  end  of  the  output. 
Intrmediate  output  consists  of  univariate  descriptive  statistics  and 
plots  (if  requested)  and  coefficients  for  the  expansion  of  the 
logarithm  of  the  dependence  density.  An  integrating  factor  is  also 
displayed  providing  a  diagnostic  as  to  the  legitimacy  of  the  estimated 
dependence  density.  Sample  output  from  BISAM  will  be  presented  in  the 
next  chapter. 

The  current  version  of  BISAM  writes  the  values  of  an  estimated 
dependence  density  and  bivariate  densl ty-quanti le  function  to  a 
temporary  disc  file  to  be  accessed  by  a  SAS/GRAPH  procedure.  PROC  G30 
and  PROC  GCONTOUR  are  then  employed  to  produce  three  dimensional  plots 
and  contour  plots  of  the  appropriate  function.  Output  from  these 
procedures  appear  in  the  next  chapter.  The  FORTRAN  routine  CP10T 
written  by  Phil  Spector  provides  a  contour  printer  plot.  Output  from 
this  routine  appears  following  the  parameter  estimates  for  each  order 
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of  approximation.  The  use  of  CPLOT  has  currently  been  suppressed 
because  of  the  availability  of  SAS/GRAPH. 

5.5  A  Note  on  Computer  Facilities 

The  programs  mentioned  were  developed  and  run  on  an  Amdahl  kJO 
V/6  or  Amdahl  A70  V/78  operated  by  the  Data  Processing  Center  at  Texas 
A  and  M  University.  The  operating  system  employed  is  MVS/JES3 
allowing  the  joint  operation  of  both  Amdahls  as  a  single  system.  The 
WYLBUR  text  manipulation  system  was  used  to  type,  edit,  and  run  the 
programs  from  a  CRT  terminal. 

The  program  BISAM  is  executed  using  a  one  step  system  procedure 
FORTX  that  compiles,  loads,  and  executes  a  FORTRAN  program.  Since 
BISAM  has  been  in  the  developmental  stage,  a  compiled  version  has  not 
been  created  so  run  times  reflect  the  more  inefficient  procedure 
employed.  A  typical  run  of  BISAM  for  a  bivariate  data  set  of  size  100 
requesting  all  output  will  use  about  50  CPU  seconds  and  will  entail 
the  reading  of  2398  card  images.  Naturally,  different  data  sets  of 
the  same  size  may  have  vastly  different  characteristics  (such  as  the 
number  of  tied  observations,  correlation,  etc.)  so  that  the  above 
represents  only  an  approximation.  Furthermore,  the  above  results  are 
by  no  means  typical  for  all  computers  and  are  stated  here  only  as  a 
rough  guide. 


6.  EXAMPLES  AND  APPLICATIONS 


6.1  Introduction 

Testing  a  new  statistical  technique  entails  two  stages  of 
verification: 

1)  Check  the  techniques  using  data  from  known  parent  distributions. 

2)  Expose  the  techniques  to  "real  data"  and  compare  and  contrast  the 
results  to  those  obtained  using  other  techniques. 

Poor  results  at  stage  (1)  should  cause  one  to  discard  a  trial 
methodology  without  bothering  to  continue  to  stage  (2),  although  valid 
exceptions  to  this  practice  may  occur.  Often  poor  results  from  the 
first  stage  suggest  modifications  to  improve  a  procedure  so  that  it 
need  not  be  completely  discarded.  The  second  stage  should  be 
emphasized,  however,  as  it  simulates  an  environment  in  which  a 
technique  will  actually  be  used.  One  need  not  confirm  the  analysis  of 
others.  In  fact,  if  the  new  technique  is  more  sensitive  than  existing 
ones,  it  may  suggest  additional  interpretations  of  experimental 
results  that  are  more  appealing  than  those  previously  obtained.  In 
the  area  of  data  analysis,  one  is  especially  interested  in  obtaining 
as  much  insight  as  possible  about  the  nature  of  a  data  set.  Hence,  a 
technique  offering  extended  diagnostics  is  especially  welcome.  One 
must  be  careful,  however,  to  avoid  being  overwhelmed  by  an  abundance 


of  diagnostics  that  may  be  performing  similar  tasks.  Each 
statistician  must  choose  those  techniques  which  he  feels  are  best 
suited  for  data  analysis,  and  if  additional  procedures  are  warranted, 
they  may  be  applied  as  needed.  For  example,  if  a  statistician  prefers 
to  use  two  density  estimation  techniques  to  get  an  idea  of  the 
distribution  of  a  data  set,  and  both  techniques  give  conflicting 
results,  a  third  technique  may  be  employed  in  an  attempt  to  verify  or 
contradict  the  results  already  obtained.  Consequently,  one  may  prefer 
to  withhold  sophisticated  and  expensive  procedures  unless  the  easier, 
less  expensive  methods  fail  to  adequately  deal  with  the  problem  at 
hand.  Many  statisticians  are  content  to  use  a  histogram  to  diagnose 
symmetry  when  the  inference  tool  to  be  employed  is  fairly  robust  to 
slight  deviations  from  symmetry,  but  rarely  would  a  histogram  be 
adequate  to  confirm  rigid  distributional  requirements. 

In  this  chapter,  some  of  the  procedures  developed  in  previous 
chapters  will  be  exposed  to  simulated  and  real  data  sets.  At  stage 
one,  techniques  with  subjective  smoothing  requirements  are  often 
easily  made  to  conform  to  the  simulated  shape  or  value.  It  is  at 
stage  two  that  the  weaknesses  of  the  subjective  factors  involved  are 
exposed.  Interpretations  will  be  offered  for  conflicting  results. 

6.2  Univariate  Examples 

To  examine  the  univariate  density  technique  of  Chapter  3*  we 
consider  three  data  sets.  These  are  given  by 
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A)  100  Gamma  (10, 1)  r.v.'s 

B)  50  N  (0 , 1 )  r.v.'s  mixed  with  50  N(3,.25)  r.v.'s 

C)  63  observations  on  snowfall  in  Buffalo,  Hew  York, 
from  1910  to  1972. 


Listings  of  these  data  sets  may  be  found  in  Tables  5  through  7. 


Taole  5- 

100  Observat 

‘ons  from  a 

Gamma  (10, 1) 

Distribution 

4.24783 

7.13944 

8.54367 

10.02394 

12.09786 

5-04321 

7.15666 

8.61258 

10.08187 

12.12808 

5.2W 

7.21348 

8.62224 

10.31604 

12.30663 

5-1*2475 

7.27734 

8.63897 

10.42786 

12.41536 

5.44763 

7.32370 

8.77373 

10.54578 

12.53470 

5.53802 

7-39561 

8.87619 

10.56661 

12.81958 

5.59490 

7.47524 

8.90276 

10.57409 

12.88938 

5.67504 

7.50638 

9.06354 

10.67747 

12.96910 

6.13585 

7-54331 

9.09246 

10.90295 

13.02124 

6.20833 

7.83998 

9.23594 

10.90404 

13.75874 

6.21723 

7.84698 

9.25439 

1 1 .00485 

13.87460 

6.37002 

7.94389 

9.37018 

11.13315 

13.95213 

6.38983 

8.03307 

9.38267 

11.13889 

14. 08677 

6.46811 

8.03327 

9.49000 

11.28637 

14.19301 

6.50198 

8.10931 

9.51031 

11.31890 

14.19326 

6.52324 

8.12584 

9-57556 

1 1 . 38OO6 

15.87417 

6.57701 

8.197H 

9.63029 

11.47916 

16.10650 

6.61291 

8.31321 

9.81822 

11.61629 

18.13609 

6.75889 

8.39989 

9.89H5 

11.69777 

18.79675 

7.01929 

8.40721 

9-96447 

12.02569 

19.88026 

56 


Table  6.  Sample 

from  the 

Normal  Mixture 

•  5  N  (0, 1) 

+  .5  N(3..25) 

-2.46836 

-0.07938 

0.93945 

2.58292 

3-07817 

-1.76497 

-0.07224 

1.04555 

2.61084 

3. 08599 

-1.54261 

-0.03350 

1.15230 

2.65899 

3.08894 

-1.39970 

-0.01924 

1.15795 

2.66611 

3.18530 

-1.18157 

0.02757 

1.27020 

2.66671 

3.19295 

-0.92799 

O.O3663 

1.32825 

2.69727 

3.21383 

-0.78408 

0.05647 

1.35215 

2.71167 

3.22806 

-0.78233 

0.08547 

1 .51836 

2.73020 

3.23082 

-0.72175 

0.09712 

1.86996 

2.76127 

3.27329 

-0.71528 

0.40045 

1.91409 

2.83441 

3.28711 

-0.69271 

0.54350 

1.99246 

2.83984 

3.34942 

-0.63495 

0.57335 

2.01725 

2.84271 

3-43836 

-0.62522 

0.57826 

2.17318 

2.84482 

3.49594 

-0.58790 

0.64629 

2.17619 

2.86786 

3.49905 

-0.42311 

0.68756 

2.25658 

2.92654 

3.52823 

-0.41908 

0.70090 

2.28519 

2.95894 

3.68402 

-0.39825 

0.79734 

2.39207 

3.02427 

3.70910 

-0.22372 

0.82677 

2.42322 

3-03565 

3.74198 

-0.19536 

0.83868 

2.43118 

3.05925 

4.02736 

-0. 18169 

0.90461 

2.47505 

3.07473 

4.17188 

will  consider  a  set  of  100  simulated  N(0,1)  random  variables  and 
compare  the  least  squares  estimates  to  the  usual  maximum  likelihood  or 
UMVU  estimates  of  the  parameters.  The  normal  data  set  is  not  exposed 
to  the  other  procedures  because  they  all  seem  to  perform  well  for 
smooth,  symmetric  densities.  Data  sets  A  and  B  will  seek  to  test  the 
techniques  for  sensitivity  to  skewness  or  bi modality,  while  data  set  C 
is  included  because  it  has  been  analyzed  in  a  variety  of  density 
estimation  references  (see,  e.g.,  Parzen,  1979.  or  Tapia  and  Thompson, 
1978) . 


The  popularity  of  the  histogram  and  the  fact  that  the  bin  width 
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Table  7* 

Year  1 y  Snowf a  1 1 

in  Buffalo, 

New  York, 

1910-1972 

25-0 

58.0 

77.8 

85.5 

104.5 

39-8 

60.3 

78.1 

87.4 

105.2 

39-9 

63.6 

78.4 

88.7 

no.o 

40.1 

65.4 

79.0 

89.6 

1 10.5 

46.7 

66.1 

79-3 

89.8 

110.5 

49.1 

69.3 

79-6 

89.9 

113.7 

49.6 

70.9 

80.7 

90.9 

114.5 

51.1 

71-4 

62.4 

97.0 

115.6 

51.6 

71.5 

82.4 

98.3 

120.5 

53-5 

71.8 

83.0 

101 .4 

120.7 

54.7 

72.9 

83.6 

102.4 

124.7 

55-5 

74.4 

83.6 

103.9 

126.4 

58.0 

77-8 

85-5 

problem  is  well  documented  make  its  analysis  here  unnecessary.  For 
illustrative  purposes,  we  include  Figure  6  showing  the  output  of  PROC 
CHART  of  SAS  for  the  Buffalo  snowfall  data. 

Using  the  FORTRAN  routines  documented  in  the  last  Chapter,  we 
obtain  density  estimates  to  be  labeled  as  follows: 

FNN  -  Nearest  Neighbor  density  estimate 

FKR  -  Kernel  estimate 

FKT  -  Kronmal -Tartar  trigonometric  series  estimate 

FTK  -  Tar tar-Kronmal  complex  Fourier  series  estimate 

FMI  -  minimum  information  estimate  using  Legendre 
[-1,1]  polynomials 

F«C  -  minimum  information  estimate  using  complex 


Fourier  series 
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Table  8. 

Selected  Output  from  SAS  PROC 

UN  1  VAR  1  ATE 

for  Data 

Set  A 

MOMENTS 

N 

100 

SUM  WGTS 

100 

MEAN 

9.61*664 

SUM 

964.664 

STD  DEV 

3.06639 

VARIANCE 

9-40273 

SKEWNESS 

0.904072 

KURT0SIS 

1 .05021 

USS 

10236.6 

CSS 

930.871 

CV 

31.7871 

STD  MEAN 

0.306639 

T: MEAN-0 

31.4593 

PR0B> |T  j 

0.0001 

QUANTILES 

100%  MAX 

19.8802 

99* 

19.8694 

75%  Q3 

11.3648 

95* 

15.7901 

25*  Q1 

7.34168 

10* 

6.20922 

0%  MIN 

4.24783 

5* 

5.45215 

1* 

4.25578 

RANGE 

15.6324 

Q3-Q1 

4.0231 

MODE 

4.24783 

FTK  will  only  be  applied  to  data  set  C  to  illustrate  the  MISE  order 
determining  criterion.  Examples  of  the  application  of  autoregressive 
density  estimation  to  Buffalo  snowfall  may  be  found  in  Parzen  (1979b) 
along  with  other  examples,  and  hence  will  not  be  included  here.  Tapia 
and  Thompson  (1978)  also  give  a  variety  of  examples  for  various 
density  estimation  routines.  Tartar  and  Kronmal  (1976)  illustrate  the 
use  of  FTK  applied  to  relatively  smooth  data  sets.  We  include  the 
above  estimates  for  comparison  and  illustrative  purposes. 

For  data  set  A,  Figures  7  through  11  contain  the  "best"  estimates 


obtained  from  a  procedure  overlayed  with  the  true  population  density 
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Table  9.  Selected  Output  from  SAS  PROC  UNIVARIATE 
for  Data  Set  6 


MOMENTS 


N 

100 

SUM  WGTS 

100 

MEAN 

1.52001 

SUM 

152.001 

STD  DEV 

1.62488 

VARIANCE 

2.64023 

SKEWNESS 

-0.359466 

KURTOSIS 

-1.07955 

USS 

492.427 

CSS 

261.383 

CV 

106.899 

STD  MEAN 

0.162488 

T: MEAN-0 

9.35463 

PR0B> |T | 

0.0001 

QUANTILES 

100%  MAX 

4.17188 

99* 

4.17043 

75%  Q3 

2.95084 

95* 

3-67623 

50%  MED 

1.95327 

90* 

3.34319 

25%  Q1 

0.029835 

10%  - 

■0.713023 

0%  MIN 

-2.46836 

5* 

-1.16889 

1% 

-2.46133 

RANGE 

6.64024 

Q3-Q1 

2.921 

MODE 

-2.46836 

used  to  generate  the  data.  One  notes  that  FNN  has  difficulty 
smoothing  out  the  mode  of  the  density,  but  the  value  at  the  mode  is 
0.14  which  is  close  to  the  true  value  0.13*  The  densities  FKR,  FMI, 
and  FMC  all  perform  an  adequate  approximation  to  the  parent 
gamma(IO,l)  density.  Coefficients  for  FKT  for  the  three  data  sets  are 
displayed  in  Table  11.  Table  1Z  contains  the  coefficients  for  FMI, 
and  Table  13  contains  coefficients  for  FMC. 

For  data  set  B,  some  of  the  techniques  have  a  little  more 
difficulty  approximating  the  bimodal  parent  density.  Figures  12 
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Table  10. 

Selected  Output  from  SAS  PROC 

UN  1  VAR  1  ATE 

for  Data 

Set  C 

MOMENTS 

N 

63 

SUM  WGTS 

63 

MEAN 

80.2952 

SUM 

5058.6 

STD  DEV 

23-7198 

VARIANCE 

562.629 

SKEWNESS  - 

-0.0186313 

KURTOSIS  - 

-0.562101 

USS 

441065 

CSS 

34883 

CV 

29.5407 

STD  MEAN 

2.98842 

T: MEAN-0 

26.8688 

PR0B> |T | 

0.0001 

QUANTILES 

100%  MAX 

126.4 

99* 

126.4 

75*  Q3 

98.3 

95* 

120.66 

50*  MED 

79.6 

90* 

114.18 

25*  Q1 

63  •  6 

10* 

49.3 

0*  MIN 

25 

5* 

39-94 

1* 

25 

RANGE 

101.4 

Q3-Q1 

34.7 

MODE 

82.4 

through  17  contain  plots  for  this  data  set.  FNN  once  again  seems  to 
fluctuate  randomly  about  the  modes  but  gives  a  good  rough 
approximation  to  the  bimodal  shape.  FKR,  FMI ,  and  FMC  once  again  seem 
to  provide  the  best  results  with  some  problem  in  estimating  the  true 
value  of  the  parent  density  at  the  modes,  but  diagnosing  the  bimodal 
shape  well.  Figure  17  shows  the  exceptional  ability  when  presented 
with  an  above  average  initial  density  estimate  in  CMPDEN  instead  of 
FNN,  and  the  resulting  value  of  FMC  is  extremely  close  to  the  true 
values  of  the  parent  density.  The  only  discrepancies  occur  in  the 
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Figure  6.  Histogram  of  Buffalo  Snowfall  Data 


***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 

***** 


112 


•  -  tun 


0-  7*0 


4  so  2i 

0  0323 

oo*io 

;  o  • 

4  74  3  i 

0  03  *1 

0.0242 

o  • 

4  0131 

0.047* 

0  0>*« 

1  0 

5 . *2*  i 

0 .0*24 

0 .0444 

0  • 

S . 7**4 

0  0*74 

0 .0104 

;  o  • 

0  103' 

0  0  724 

o 

*  44  47 

0  0*4  1 

0  0440 

c  • 

*  7**2 

0  >2*0 

0  0*40 

.  o  • 

0.11*7 

0 . 1040 

c  • 

7.«*72 

0  .  1  2  •  0 

0  11  37 

1  0  • 

7  .  Id  7  7 

0.127* 

0  1  201 

o  • 

0  1*41 

0.1*42 

1  4**7 

0. 1 3*S 

0  .  '2*7 

*  *1*2 

0  <110 

0.1*15 

!  •• 

0  1207 

0  ;  3  i  5 

«  4102 

0 . • 1 II 

0.1*44 

t  *40  7 

0.M1* 

0  .  1241 

0  !  0  »  * 

0  1225 

0.0171 

0.1172 

*  0 

'0  *721 

0  M  27 

0  1110 

0* 

•  i . ; >  2* 

0.101* 

0  104* 

i  *  ® 

' »  4431 

0 . 0**7 

0.0471 

•  o 

i i  1111 

0 . 0*09 

0  044  7 

•  0 

1 2 . 23*3 

0  0713 

0 . 0423 

•  0 

« : . 474* 

0 .04*2 

0 .0744 

*  0 

i 2 . 4 • 53 

0.044  7 

0  047* 

*  0 

13.2441 

0  05  17 

0.0410 

;  »o 

13 . 54*3 

0  0453 

0  0545 

•  c 

i3.»3** 

0.037* 

0.0444 

!  «  o 

•42773 

0  .  0340 

0.042* 

1  •  0 

•  4  *  1  7* 

0.0243 

0  ,<JJ7« 

•  0 

•4*4*4 

0.0  251 

0.0330 

•  0 

14. 24*9 

0.0211 

0.0*47 

!  •  o 

(4*344 

0.0215 

0.02*4 

;  •  o 

1 4 . *7*9 

0.0147 

0  0  2  15 

•  0 

H  3J04 

0  0  1*0 

0  0>*4 

1  0» 

i*  1101 

0.0171 

0 .otf 1 

0* 

17.0014 

0 .0112 

0  0  134 

;  o* 

•  7  .  34  i  4 

0.0140 

0.0115 

;  °  • 

17  *124 

0  0139 

0  004  7 

0  • 

<9.0229 

C  0  130 

0  0043 

;  0  • 

1  I. 3*34 

O.OT22 

0.0044 

0  • 

•  *  703  1 

0.0  1  T  5 

0 . 0011 

1  0  • 

1  *  .0444 

0*104 

0  004* 

i  ®  • 

it  3140 

0  0  103 

0  004  1 

I®  * 

>•7244 

0.004* 

0  0034 

J  t>  • 

0  OOJ* 


Figure  7.  Nearest  Neighbor  Density  Estimate  for  Gamma (10, 1)  Data, 

k-?5 


tail  areas  which  is  typical  for  the  orthogonal  expansion  techniques. 

For  data  set  C,  Figures  18  through  23  represent  the  different 
shapes  one  may  subjectively  obtain  from  a  density  estimation 
procedure.  FNN  seems  to  indicate  a  tri modal  parent  density.  FKR 
depicts  a  unimodal  shape.  FTK  is  included  in  this  analysis  with 
coefficients  listed  in  Table  14.  For  FKT  and  FTK,  functionals  of  the 
estimated  p.d.f.  were  included  to  compare  to  the  usual  unbiased 
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8.  Kernel 

Densi ty 

Estimate 

for 

Gamma  (10,1)  Data,  h-1.0 

estimate*  of  location  and  scale.  Table  15  contains  selected 
functionals  of  the  estimated  densities  for  the  three  data  sets. 

Recall  from  Table  10  (p.  160)  the  sample  mean  for  Buffalo  snowfall  is 
80.29  and  the  sample  variance  is  582.629*  Typically,  for  different 
shapes,  an  orthogonal  expansion  estimate  presents  fairly  stable 
functional  values  for  the  mean  and  variance.  FKT  and  FTK  admit  both 
unimodal  and  trimodal  shapes.  Using  the  WISE  criterion  for  FTK,  a 
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Figure  9.  Trigonomatr ic  Sariat  Density  Estimate  for 
Gamma (10,1)  Data,  m-7 


uni modal  shape  is  obtained  with  mean  and  variance  "close"  to  their 
unbiased  counterparts.  An  order  7  approximation  for  FKT  is  displayed 
to  illustrate  a  tri modal  shape,  although  a  unimodal  shape  is  obtained 
for  orders  1  through  4.  Again,  the  mean  and  variance  estimates  are 
similar  to  their  unbiased  counterparts. 

FMI  and  FMC  tend  to  provide  different  shapes,  and  as  sxamplified 
above,  the  complex  expansion  tends  to  introduce  multimodal  estimates 
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Figure  10.  Minimum  Information  Series  Estimate  using  Legendre 
Polynomials  for  Gamma (10,1)  Data,  m-3 


at  lower  orders  than  the  polynomial 
for  FMI  depicts  the  transition  from 
The  order  6  estimate  for  FMC  already 

shape. 


expansions.  The  order  6  estimate 
a  uni  modal  to  a  tri modal  shape, 
clearly  indicates  a  trimoda! 


One  might  note  that  for 


the  two  objective  criteria  MISE  and  CAT, 


a  uni  modal  shape  for  Buffalo  snowfall  is  indicated,  although  most 
procedures  will  admit  trimoda)  shapes.  (See  Parzen,  1979.  for 
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Figure  11.  Minimum  Information  Series  Estimate  using  Complex 
Exponentials  for  Gamma(10,1)  Data,  m-6 


illustration  of  CAT  results).  When  the  true  p.d.f.  is  known,  as  is 
the  case  for  data  sets  A  and  B,  a  procedure  can  usually  be  steered  to 
produce  a  desired  shape.  Consequently,  a  better  test  for  a  procedure 
would  be  for  stability  or  objectivity.  The  estimate  FNN  is  fairly 
stable  for  orders  of  5  or  more  in  terms  of  representing  a  fixed  number 
of  modes  although  fluctuations  in  mode  estimates  occur.  The 
orthogonal  expansion  techniques  tend  to  be  unstable,  especially  for 
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DATA  SET  A: 

Table 

1 

11.  Trigonometric  Series 

COF  DATA  SET 

Coef  f  ic 

B:  1 

ents 

COF 

1 

0.0530 

1 

-0.0752 

2 

-0.0234 

2 

-0.0235 

3 

-0.0317 

3 

-0.0289 

4 

-0.01 1 1 

4 

-0.0595 

5 

-0.0123 

5 

0.0779 

6 

-0.0068 

6 

-0.0456 

7 

-0.0044 

7 

0.0393 

8 

-0.0027 

8 

0.0123 

9 

-0.0021 

9 

-0.0222 

10 

-0.0015 

10 

-0.0071 

DATA  SET  C: 

i 

COF 

l 

-0.0021 

2 

-0.0057 

3 

-0.0002 

4 

0.0014 

5 

-0.0023 

6 

-0.0021 

7 

0.0032 

8 

0.0014 

9 

-0.0013 

10 

0.0006 

large  orders  of  approximation,  but  objective  use  of  criterion 
functions  helps  to  overcome  this  problem.  The  RISE  criterion  of 
Tartar  and  Kronmal  (1970,1976)  does  not  translate  effectively  to  the 
minimum  information  procedure,  but  one  suspects  a  modification  of 
WISE,  CAT,  or  AIC  should  better  handle  the  problem.  Further  research 
is  warranted  in  this  area.  Consequently,  we  do  not  advocate  as  yet 
the  use  of  minimum  information  techniques  over  autoregressive 
estimation  with  the  CAT  criterion  or  estimation  using  FTK  with  the 
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Table  12.  Coeff 

i c i ents 

for  FMI  Density  Estimate 

Data 

Set  A:  Order 

Coef 

Data  Set  B:  Order 

Coef 

1 

-1.411 

1 

0.658 

2 

-0.990 

2 

-0.986 

3 

0.749 

3 

0.228 

4 

-0.746 

Data 

Set  Cs  1 

0.237 

5 

-0.977 

2 

-0.951 

6 

0.267 

3 

0.012 

7 

0.552 

4 

0.164 

8 

0.392 

5 

0.082 

6 

-0.227 

WISE  criterion.  However,  in  the  bivariate  case,  the  minimum 
information  approach  is  easily  extended  and  seems  more  appropriate 
than  existing  procedures^  especially  in  terms  of  controlling  the 
amount  of  graphical  displays  necessary  to  arrive  at  an  acceptable 
estimate. 

To  illustrate  the  parametric  application  of  the  minimum 
information  approach  to  estimating  parameters  for  normal  and  gamma 
models,  data  set  A  is  examined  along  with  data  set  D  consisting  of  a 
random  sample  of  100  N(0,1)  values.  A  parametric  model  for  data  set  A 
appears  in  Figure  24.  Note  the  least  squares  estimates  are  a«8.94  and 
b*0.94  which  correspond  to  an  estimated  gamma  mean  of  9.51  and  an 
estimated  gamma  variance  of  10.12.  Data  set  D  is  listed  in  Table  16 
with  selected  descriptive  statistics  from  PROC  UNIVARIATE  of  SAS 
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Table  13.  Coefficients 

for  FMC  Dens 

ty  Estimate 

DATA  SET  A: 

IND 

REAL  (THETA) 

IMAG  (THETA) 

-1 

-0.316603 

0.596658 

1 

-0.316603 

-0.596659 

-2 

-0.070131 

0.161096 

2 

-0.070130 

-0.161097 

-3 

-0.056627 

0.069507 

3 

-0.056627 

-c. 069507 

DATA  SET  B: 

IND 

REAL  (THETA) 

IMAG (THETA) 

-1 

-0.175565 

-0.197545 

1 

-0.175561* 

0.197546 

-2 

-0.204320 

-0.293631 

2 

-0.204320 

0.293632 

-3 

-0.114078 

-0.043741 

3 

-0.114078 

0.043741 

DATA  SET  C: 

IND 

REAL  (THETA) 

IMAG (THETA) 

-1 

-0.351673 

-0.079655 

1 

-0.351673 

0.079656 

-2 

-0.065335 

-0.074402 

2 

-0.065334 

0.074402 

-3 

-0.117258 

-0.186600 

3 

-0.117258 

0.186600 

appearing  in  Table  17 .  Figure  25  shows  the  parametric  representation 
of  the  normal  density  with  a  least  squares  estimated  mean  of  -0.10  and 
variance  of  1.25.  These  examples  illustrate  the  parametric 
applications  of  the  minimum  information  procedures,  suggesting 
possible  extensions  to  goodness-of-f i t  diagnostics. 
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Figure  12.  Nearest  Neighbor  Density  Estimate  for 
Normal  Mixture  Data.  k»15 


A  few  comments  are  in  order.  Examination  of  the  expansion 
coefficients  reveals  a  rather  rapid  decay,  as  expected,  but  higher 
order  coefficients  usually  remain  large  enough  to  induce  a  very  wiggly 
shape  to  the  estimated  density.  Furthermore,  expansions  using  complex 
exponentials  seem  to  require  fewer  terms  than  the  polynomial  or 
trigonometric  series  expansions.  This  indicates  that  complex  Fourier 
series  may  converge  more  rapidly  than  other  series  expansions,  a 


1 

•  -  Ml 

0-  CM0 

404 

0.0135 

0.00*5 

1201 

0.0140 

0.0115 

0.0113 

0 . 0 1  It 

on? 

0  021  4 

0  0  257 

«?i  1 

0.0211 

0  .0142 

T  3  0  $ 

0.0372 

0 .0441 

030 

0 .04*4 

0.0570 

4154 

0.0574 

0.0712 

2171 

0  072  1 

0 .0171 

<403 

0.0104 

0  104  1 

021 

0  .  *  1 21 

0  121* 

i*5: 

0  1351 

0.13*5 

4 1  7  i 

0 .  I  545 

0.1511 

550  1 

0.1737 

0 .  17  14 

4025 

0.  i«54 

0 ■  < 11* 

2141 

0  1*01 

0  1  *3  1 

1074 

0 . 1*01 

0  1*13 

0402 

0  1*54 

0.  1**2 

»  «t? 

0 .  1  TIT 

0  1*1* 

1353 

0.1732 

01  115 

4121 

0 . 1701 

C  .  1  775 

4304 

0  >  1*0 

0  1  |15 

7*10 

0  '**4 

0  i*74 

1255 

0  *5*1 

0  13  00 

0  73< 

0 . i«ll 

0  1  t  24 

2  207 

0.1341 

0  0154 

1412 

0.1210 

0  0102 

5151 

0 . ' 1 2« 

0  0|I2 

4433 

0  1114 

0.04)4 

4  104 

0 . 1205 

0.0125 

1545 

0.'1*5 

0.0741 

1040 

0  U74 

0  .  <'>17 

25  3  4 

0.2014 

9  1470 

40  l  1 

0  21*4 

0  204  1 

5  4  17 

0  2742 

0.2712 

4443 

C  1034 

0.3170 

«43| 

0 . J207 

0  3121 

14  <4 

0  121* 

0  4003 

I  11* 

0.104  1 

0.1*45 

2145 

0  .  2755 

0  . 33*  r 

414  1 

0  :  3  4  1 

0  274  2 

51*4 

0  t  *0  < 

0.2012 

7*12 

0.144* 

O.llll 

4741 

0  .  10*0 

0 .0112 

0  24  J 

0.0771 

0.0  412 

1  *  '  1 

0.0531 

0  02*4 

Figure  13.  Kernel  Density  Estimate  for  Normal  Mixture  Data,  h* 1.0 


conjecture  that  seems  to  be  supported  by  the  literature.  This 
motivated  using  only  complex  expansions  in  the  bivariate  case, 
although  one  may  wish  to  consider  other  choices  for  the  orthogonal 
system  of  functions  to  use. 

In  this  section  we  have  presented  several  examples  of 
nonparametr ic  density  estimation  procedures.  Essentially,  we  have  let 
the  plots  speak  for  themselves  to  illustrate  obtainable  shapes  and  the 
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Figure 

14.  Tr igonometr ic  Series  Density 

Estimate  for 

Normal  Mixture  Data,  m*6 

corresponding  anomalies  that  are  produced.  The  examination  of  the 

univariate  ft  I  DEN  and  CMPOEN  procedures  illustrates  the  power  of  the 

minimum  information  approach  and  justifies  the  examination  of 

bivariate  extensions.  Had  CMPDEN  performed  poorly  in  the  univariate 

case,  one  would  have  little  faith  in  the  ability  of  its  bivariate 

counterpart.  With  this  motivation,  one  may  now  examine  the  bivariate 
extension  of  CMPOEN. 

I 

1 


173 


•-  f*I  0*  FMO 


•  a 

o  ©ass 

0 . 0015 

-a .  i2oi 

0.0270 

0.01*3 

•2.1732 

0.0302 

0  023  1 

•  2.0257 

0 .032S 

0 .0300 

- 1 .  mi 

0 .0340 

0  41(1 

•  i  m* 

0 .0310 

0.0449 

•I.JIJO 

0.0523 

0  OS  73 

• » . 4 1S4 

0.0040 

0.07)3 

■1.2079 

0 . 0022 

0.0*74 

- 1 .  i «o a 

0.0114 

0  104? 

-0  112* 

0.  1  '  55 

0  1211 

-0 . *452 

0.13)3 

0 . 1305 

-0. *97* 

0.  ISIS 

0.  I  544 

-0  i JO > 

0.  ) 7  10 

0.  >712 

-o - 4oa$ 

0.1007 

0.1039 

■«. 2549 

0.2001 

0 . 1027 

-0 . 1074 

0.2H0 

0.1901 

0.0402 

0.22S0 

0 . 1113 

0 . 1077 

0.2  2  2  1 

0 . 1042 

0 . JJ 42 

0.2147 

0. >072 

0 .40  21 

0.3021 

0.1772 

0 . 0304 

0. 1037 

0.  >03} 

0 . 7700 

0  1012 

0 . 1474 

o.tass 

0 .  >4|4 

0 .  >300 

t .0711 

0.1203 

0  .  1  1 24 

1 . 220* 

0.  J 12* 

0.0155 

1.3002 

0  .  1014 

0.0004 

1  .  5  l  51 

0 .0003 

0 .0*01 

1  0033 

0  .  1010 

0.0170 

1.0100 

0  .  IOSS 

o  .on: 

1  ISOS 

0.1140 

0 .0754 

2  tOOO 

0.1347 

0 . 1047 

a.  asjo 

0.  1015 

0.1407 

2  401  1 

0.  ms 

0 . 20  SI 

2. S407 

0. 2401 

0.273  1 

a. oooi 

0 . 2901 

0  3370 

2. 0430 

0.33*1 

0.3034 

2.10  14 

0.1SSS 

0.4003 

1 . >390 

0.3)70 

0.3037 

3. 200S 

0 . 2004 

0.3314 

1.4141 

0.2107 

0.2743 

3  SOtO 

0.1034 

0 . 2040 

1 . 7J02 

0.0911 

0.1300 

1 . 0700 

0.0714 

0.0030 

4 . 0243 

0.0047 

0.0413 

4.1711 

0.0004 

0.0250 

Figure  15-  Minimum  Information  Series  Estimate  using  Legendre 
Polynomials  for  Norma)  Mixture  Data,  m*8 


t  •-  me  a-  m® 


•J.*nr 

0.9*11 

0 

*104 

'  0  s 

•2.  nil 

0  0190 

0  • 

•  2  1  i  It 

9.0121 

0212 

;  o  * 

•  '  4*2i 

0  0110 

0 

02*4 

0* 

•  •  1211 

0.0141 

3 

01** 

«0 

•  •  . **tt 

0.0*30 

0 

0*17 

|  *0 

’  1325 

9.0111 

0 

a 

-  •  .1110 

0.014* 

0 

0*4* 

•0 

* '  DM 

0  04  3* 

0 

9424 

•  0 

- ' .0430 

9.1021 

0 

■  044 

•  0 

■OHM 

3 .  '211 

9 

'2*1 

•  0 

-«  ft** 

0  31* 

0 

•**• 

•  0 

-0 . ISIS 

0-112* 

0 

ill  1 

•  0 

-0  *911 

3.1420 

0 

1  *94 

•  0 

•0  i 104 

0.14*1 

0 

•1*4 

#  a 

•92111 

0  .  i*i« 

0 

1410 

•  0 

-9.91*1 

0. I *|l 

9 

<*•• 

s  0 

0 .9**2 

0 . 1 T31 

0 

i*** 

•  0 

0  2  21* 

0.1*21 

0 

1*44 

•  0 

9.1*22 

0 . 1 *0t 

0 

1*41 

•  0 

O  HM 

0.1491 

0 

1  **s 

•  0 

0  1411 

0.1*11 

0 

<9*1 

•0 

0  It  >  1 

0. 1 *00 

0 

1*31 

0« 

0  Ml) 

0 . 1 1*1 

0 

i  2 1 1 

0  S 

t  1041 

0 . 1 2*1 

0 

101* 

0 

• 

l  21>* 

•  .  t  ' ** 

0 

0*21 

0 

■ 

t . 1**1 

0.1101 

0 

0**4 

3 

s 

< . 1444 

0 . 101* 

9 

0*4* 

0 

• 

1  1*0* 

0 . 1011 

0 

0104 

0 

• 

1374 

0  .  II 20 

0 

0431 

0 

• 

>■40 

0.124* 

0 

0*11 

0 

• 

2  •!«» 

0 . 1*41 

0 

1041 

0 

s 

2  2**0 

0  .  t  *  *  1 

0 

till 

0  • 

2  *739 

0.2211 

0 

31*0 

:  1*99 

0.2*34 

0 

7124 

5  *»M 

0  124* 

0 

3*4* 

!  mu 

0  111* 

0 

3l*i 

1  00**. 

0 . 3**1 

0 

4002 

1  (lit 

0 .2111 

0 

no* 

1 . 107* 

0.10*4 

0 

112* 

3  **12 

0 . 24*1 

0 

34*0 

1  »*»* 

9 . 1 tt* 

0 

1*44 

•  - 

3  **22 

0  1  !H 

0 

ill  I 

•  0 

2  If** 

0 . 0  <  *0 

0 

•  47* 

0* 

* .0111 

9.0*11 

0 

0**1 

;  2 

s 

«.  till 

0 . 0*42 

0 

0241 

;  o  • 

o  o>o* 


Figure  )6.  Minimum  Information  Series  Estimate  using  Complex 
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Figure  17*  Minimum  Information  Series  Estimate  using  Complex 
Exponentials  with  Kernel  Initial  Estimate  for  Normal 
Mixture  Data,  m-6 
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Figure  18.  Nearest  Neighbor  Density  Estimate  for 
Buffalo  Snowfall  Data,  k-10 
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Figure  19.  Kernel  Density  Estimate  for  Buffalo  Snowfall  Data,  h»5>0 
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Figure  20.  Trigonometric  Series  Density  Estimate  for 
Buffalo  Snowfall  Data,  m»7 


«  rv  k 


is  oooo 

0  04*2 

»v»no 

0  00*4 

30 . 0704 

0  0050 

)1  1094 

9  .0055 

19  .  *444 

0  0012 

jr  *m» 

0  0070 

*4  1140 

9.4071 

41  7«*4 

0.0017 

45  2444 

0.0044 

47. 1**4 

O.O'OI 

50 -1*44 

0  til) 

53.1410 

0 .01  1* 

f  f . 4  J44 

0.0)12 

17  1550 

•  Oill 

•0  4«*4 

0  0-*5 

11  4350 

MU' 

It  1444 

0  0)  St 

44. 0414 

0  0-57 

74  1144 

0 .«•*• 

71  HO 

0.4154 

7*  7440 

0  01*1 

71.1150 

0  9  1 11 

•4  77*4 

0.91*4 

•1  1414 

0.01*1 

44  4444 

O.0U4 

14  1754 

0  0121 

10  l< 44 

0  0  1  20 

11  *414 

4-0*1* 

It  *  444 

0  0  102 

>4  »**4 

0  0011 

t4t .4144 

0  0013 

>01  5  4 14 

0 . 0*7* 

*4  >  >40 

0  00*7 

‘41. *114 

•  0051 

tit  '044 

*  0051 

tfl.f*#4 

M»*l 

Ml  21*4 

0.04** 

t  II  MtO 

4.04*1 

t  2 '  1140 

4  04*4 

1 31  114* 

4.04*4 

111)191 

4.04*1 

«  (««• 


Figure  21.  Complex  Exponential  Series  Density  Estimate  for 
Buffalo  Snowfall  Data.  m*1 
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Figure  22.  Minimum  Information  Series  Estimate  using  Legendre 
Polynomials  for  Buffalo  Snowfall  Data.  m«6 


Figure  23-  Minimum  Information  Series  Estimate  using  Complex 
Exponentials  for  Buffalo  Snowfall  Data,  m-6 


Table  14.  Coefficients  for  FTK  Density  Estimate 


DATA  SET  C 


I  NO  REAL  (THETA)  I  WAG  (THETA) 


-1 

-0.2881 

-0.0793 

1 

-0.2881 

0.0793 

-2 

0.0698 

-0.0014 

2 

O.O698 

0.0014 

-3 

-0.1041 

-0.1915 

3 

-0.1041 

0.1915 

-4 

0.0732 

0.0980 

4 

0.0732 

-O.O98O 

■5 

0.0318 

-0.0290 

5 

0.0318 

0.0290 

■6 

-0.0218 

-0.0974 

6 

-0.0218 

0.0974 

Table  15.  Selected  Functionals  of  Estimated  Densities 


Estimate 

Mean 

Var i ance 

ISF. 

MSE 

Max .  Dev 

Data  Set  A: 
True 

10.00 

10.00 

FNN  (k-15) 

9-79 

10.25 

0.0024 

0.0002 

0.0352 

FKT  (m-7) 

9-65 

9.47 

0.0019 

0.0001 

0.0332 

FMI  (m-3) 

9-48 

9.01 

0.0025 

0.0002 

0.0185 

Par am.  L.S. 

9-51 

10.12 

0.0015 

0.0001 

0.0205 

Data  Set  B: 
True 

1.50 

2.88 

FNN  (k-15) 

1.49 

2.79 

0.0055 

0.0009 

0.0781 

FKR  (h-0.4) 

1.50 

2.61 

0.0077 

0.0016 

0.0805 

FKT  (m-6) 

1-53 

3-28 

0.0023 

0.0002 

0.0343 

FMI  (m-8) 

I.31 

2.79 

0.0068 

0.0018 

0.055;, 

Data  Set  C: 
FNN  (k- 10) 

78.90 

548.77 

. 

FKR  (h-5.) 

79.92 

524.36 

- 

- 

- 

FKT  (m-7) 

80.33 

548.67 

- 

- 

- 

FTK  (m-1) 

73.15 

558.96 

- 

- 

- 

FMI  (m-6) 

76.79 

595.74 

- 

- 

- 

FMC  (m-6) 

79-03 

532.64 

— 

- 

- 

Data  Set  D: 
True 

0.0 

1.0 

Param.  L.S. 

-0.00:2 

1 .04 

0.0001 

0.0000 

0.0066 

A()-Ally  V15  TEXAS  A  A  NO  «  UNI*  COLLEK  STaTIO*  INST  Of  STATISTICS  F/fi  ! 

STATISTICAL  NOOiLlN*  OF  BIVARIATE  DATA. (U>  ° 

AU&  62  T  J  HOOOFIELD  0AA029— AH—  f— 007n 

UNCLASSIFIED  TR-B-7  Aft0-169*2 . 1 3-*U  n™ 


1 

•-FM0T 

0-  CAN 

*  *0*2 

0 . 0344 

0.0210 

4 . 7411 

0.04*2 

0 . 0313 

*. 01  JO 

0.0*70 

0 . 0310 

S  Oil 

0  0*13 

0 . 0411 

*  7MI 

o .  on  * 

0.0*13 

t  (JIT 

0  OIJ* 

0  073  ) 

i  .4*7* 

0  1  044 

0 .0147 

l.»M* 

0.1140 

0  01*7 

Tim 

0  1220 

0  .  10*7 

'  4114 

O.I2I3 

0  '  1  44 

’-•*33 

0  13  21 

9.  t  S 1  S 

I  1  172 

0  .  1  34* 

0 . ‘ 2*7 

I  SJt  1 

0  3*4 

0.1301 

I . «T10 

0  (  34  2 

0  HI* 

1  J'l* 

0.I3»* 

0  :  JI4 

1.1121 

0  1272 

0.121* 

7  10*1 

0  121* 

0  12*2 

>0  ,  2*07 

0  11*7 

0.1J1* 

>0 . »**» 

0 .  <011 

0 . 11*1 

10131* 

0  .  1  0  1  » 

0  1017 

l»  212* 

0  0140 

0  1021 

M  *  J*J 

0  01*4 

0.01** 

»  '  1702 

0.0711 

0 .0110 

<2.3142 

0.971* 

0 .010* 

'2.1111 

0 .0*44 

0.07J2 

13.0020 

0  0*77 

0  0**0 

U  24*1 

0  0*14 

0  0*12 

13.1111 

0.04** 

0  0*21 

14.0337 

0 .040  1 

0  04*1 

14.377* 

0  07*2 

0  04(2 

14.721* 

0.030* 

0  0  3  *2 

1*011* 

0  02*1 

0  03  1  * 

l  *  *0*4 

0  0  23  2 

0.02»4 

•i.  7*33 

0  0200 

0  0  23  7 

»*  .  017J 

0.0172 

0. 3204 

1  *  4411 

0.0  147 

0  017* 

.  ’1*0 

0.012* 

0.01*0 

•  T.  1211 

0.0107 

0.0*27 

17  4721 

0  .  0011 

0  010* 

•Vom 

0  0077 

0 .0011 

11.  HOT 

0 .00*4 

0.0077 

'  1  *04* 

0  00*4 

9.00*4 

’*.04** 

0 . 004 } 

0. 005* 

till** 

0  0031 

0  004* 

it  *313 

0.0032 

0.0037 

•1.IMJ 

0  002* 

0.0031 

•  0 
•  0 
•  0 
•  0 
•  0 
•  0 

. ;  ° 

•  0 
•0 
•  0 

:• 

•  o 
•0 

m 

•  o 

n 


0021 


Figure  2A.  Parametric  Gamma  Least  Squares  Density  Estimate 


Table  16. 

100  Observat 

ons  from  a 

Normal (0,1) 

Di str i but  ion 

■2.451*3 

-0.9761 

-0.2562 

0.2586 

0.8609 

-1.9855 

-0.9257 

-0.2454 

0.3443 

0.9585 

-1.9848 

-0.8681 

-0.2044 

0.3506 

1 .0888 

-1.7842 

-0.8600 

-O.1885 

0.3836 

1.0903 

-1.5225 

-0.81*1*7 

-0.1862 

0.3875 

1.1151 

-1.1*707 

-0.8020 

-0.1796 

0.4137 

1.1342 

-1 .1*021 

-0.7875 

-0.1696 

0.4164 

1.2297 

-1.3537 

-0.7511 

-0.1534 

0.4279 

1.2326 

-1 .3480 

-0.7226 

-0.1420 

0.4341 

1.3730 

-1.2972 

-0.6578 

-0.1235 

0.4372 

1.4012 

-1.2791 

-0.6520 

-0.0955 

0.4471 

1.4153 

-1.2261 

-0.5746 

-0.0321 

0.4938 

1.4422 

-1.2138 

-0.5337 

0,0161 

0.5047 

1.4573 

-1.2040 

-0.5241 

0.0852 

0.5215 

1.4895 

-1.1369 

-0.4537 

0.1382 

0.6104 

1.5830 

-1.0811* 

-0.4434 

0.1409 

0.6427 

1.6441 

-1.0793 

-0.4220 

0.1521 

0.6830 

1 .6826 

-1 .0661 

-0.3949 

0.1623 

0.7354 

1.7450 

-1.0071 

-0.3725 

0.1697 

0.7788 

1.8024 

-0.9990 

-0.2596 

0.2132 

0.7833 

2.3243 

r 
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Table  17*  Selected  Output  from  SAS  PROC  UNIVARIATE 
for  Data  Set  D 


MOMENTS 


N 

100 

SUM  WGTS 

100 

MEAN 

-0.03496 

SUM 

-3.496 

STD  DEV 

0.996355 

VARIANCE 

0.992724 

SKEWNESS 

0.0461974 

KURTOS 1 S 

-0.555495 

USS 

98.4019 

CSS 

98.2797 

CV 

-2849.99 

STD  MEAN 

0.0996355 

T: MEAN-0 

-0.350879 

PROB>|T| 

0.726425 

QUANTILES 

100%  MAX 

2.3243 

99* 

2.31908 

75*  <23 

0.634625 

95* 

1.64104 

50%  MED 

-0.1095 

90% 

1.41389 

25%  <2 1 

-0.834025 

10% 

-1.29539 

0%  MIN 

-2.4543 

5* 

-1.51991 

1% 

-2.44961 

RANGE 

4.7786 

Q3-Q1 

1.46865 

MODE 

-2.4543 
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6.3  Bivariate  Examples 

To  illustrate  the  use  of  the  BISAM  program  that  implements  the 
bivariate  data  modeling  approach  detailed  in  Chapter  A,  we  consider 
three  data  sets.  These  are 

E)  500  independent  N(0,1)  bivariate  observations, 

F)  100  bivariate  standard  normal  variables  with  a  correlation 
of  0.90 

G)  The  coronary  data  of  Scott,  et  aj_.  (1978)  . 

Selected  output  from  BISAM  for  data  sets  E  and  F  appears  in  Table  18. 
Listings  of  the  Coronary  data  appear  in  Tables  19  and  20,  and  selected 
output  from  BISAM  for  this  data  appears  in  Tables  21  and  22. 

For  the  normal  data  we  shall  merely  exhibit  the  results  and  note 
the  similarities  of  the  population  quantities  being  estimated. 

Figures  26  through  29  display  contour  and  three  dimensional  plots  from 
SAS/GRAPH.  Scatter  plots  have  been  omitted  as  have  the  univariate 
autoregressive  density  plots  which  were  normal  in  all  cases.  The 
figures  illustrate  slight  anomalies  that  may  occur  in  the  "off 
diagonal"  areas  for  a  high  degree  of  correlation.  These  are  due  in 
part  to  the  extrapolation  problem.  Clearly  few  data  points  occur  in 
these  tail  areas  to  adequately  estimate  the  bivariate  density  there. 
For  the  most  part,  the  bivariate  estimates  of  the  normal  densities  are 
pleasing.  One  notes  that  the  mode  occur ing  at  the  point  (-0.31 ,-0. 14) 
for  data  set  E  is  a  little  unusual,  but  this  seems  to  result  from  the 
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Table  18. 

Summary  of  BiSAM  Output 

for  Data  Sets  E  and  F 

DATA  SET  E 

DATA  SET  F 

X  Y 

X  Y 

Mean 

-0.092  -O.O85 

-0.026  -0.025 

Med i an 

-0.131  -0.145 

0.027  0.060 

Tr imean 

-0.109  -0.111 

-0.020  0.028 

Variance 

0.973  0.954 

1.120  1.078 

St.  Dev. 

0.987  0.977 

1 .058  1 .038 

IQ  Range 

1.337  1-420 

1.378  1.463 

Pearson  r 

-0.027 

0.893 

Spearman  Rho 

-0.038 

0.884 

Kendall  Tau 

-0.026 

0.710 

H  (d-t  i  Ida) 

-0.292 

-O.858 

H(d8) 

-0.056 

-0.351 

H  (d24) 

-0.077 

-0.462 

H  (d48) 

-0.104 

-2.621 

simulation  and  not  the  modeling  technique.  The  entropy  estimates 
exhibit  the  same  type  of  instability  discussed  before  indicating  that 
some  correction  factor  should  possibly  be  employed  in  their 
computation.  The  results  for  data  set  E  with  n*500  indicate  that  the 
entropy  statistics  may  be  asymptotically  biased.  Otherwise,  the 
results  for  the  normal  case  are  satisfactory,  leading  one  to  consider 
investigations  with  real  data. 

Scott,  et  aj..  (1978),  consider  two  sets  of  data  consisting  of 
measurements  of  plasma  cholesterol  (CHL)  concentration  and  plasma 
triglyceride  (TRG)  concentration  in  371  males.  The  males  were 
classified  into  two  groups,  320  falling  into  the  category  of 
"diseased"  amd  51  being  classified  as  "normal".  These  classifications 
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Table  19-  Listing  of  Coronary  Data  -  Normal 


PLASMA  CHOLESTEROL  -  NO  CORONARY  ARTERY  DISEASE 


195. 

237- 

205. 

201 . 

190. 

:8o. 

193. 

170. 

150. 

200. 

228. 

169. 

178. 

251. 

234. 

222. 

116. 

157. 

194. 

130. 

206. 

158. 

167. 

217. 

234. 

190. 

178. 

265. 

219. 

265- 

190. 

156. 

187. 

149. 

147. 

155. 

207. 

238. 

168. 

210. 

208. 

160. 

243. 

209. 

221. 

178. 

289. 

201. 

168. 

162. 

207. 

PLASMA 

i  TRIGLYCERIDES 

-  NO 

CORONARY 

ARTERY 

DISEASE 

348. 

174. 

158. 

171. 

85. 

82. 

210. 

90. 

167. 

154. 

119. 

86. 

166. 

211. 

143. 

284. 

87- 

134. 

121 . 

64. 

99. 

87. 

177. 

114. 

116. 

132. 

157. 

73- 

98. 

1*85. 

108. 

126. 

109. 

146. 

95. 

48. 

195. 

172. 

71. 

91. 

139. 

116. 

101 . 

97. 

156. 

1 16. 

120. 

72. 

100. 

227. 

160. 

were  based  on  medical  examination  of  the  patients  to  ascertain  the 
presence  of  coronary  artery  disease.  Using  a  Kolmogorov-Smi rnov 
goodness-of-f i t  test,  a  null  hypothesis  of  normality  is  rejected  for 
all  but  the  diseased  class  of  triglyceride  data.  This  motivated  the 
use  of  the  kernel  method  of  bivariate  density  estimation  to  assist  in 
the  analysis  of  the  data.  A  likelihood  equation  was  then  developed  to 
aid  in  patient  classification  and  to  estimate  the  risk  of  coronary 
artery  disease  based  on  CHL  and  TRG  measurements. 

It  was  determined  that  the  normal  data  exhibited  a  uni  modal  shape 
with  mode  at  (CHL.TRG)  *  ( 1 95 .122) .  The  diseased  population  was  felt 
to  exhibit  a  bimodal  thspe  with  one  mode  at  Ml  -  (185,122)  and  the 
second  mode  at  M2  ■  (2 33 *  1 ^5) •  The  univariate  kernel  estimates  were 
unimode!.  Consequently,  it  was  felt  that  some  diseased  patients  were 
virtually  indistinguishable  from  norma!  pat'ents  based  on  CHL  and  TRG 


Tabli 

e  20a 

.  Listing 

of  Coronary  Data  -  D 

i  seased 

PLASMA  CHOLESTEROL  - 

DISEASE  II 

N  AT 

LEAST 

1  OF 

3  CORONARY  ARTERIES 

184. 

263. 

185. 

271. 

173. 

230. 

222. 

215. 

233. 

212. 

221. 

239- 

168. 

231. 

221 . 

131. 

211 . 

232. 

313. 

240. 

176. 

210. 

251. 

175. 

185. 

184. 

198. 

198. 

208. 

284. 

231. 

171. 

258. 

164. 

230. 

197. 

216. 

230. 

265- 

197. 

230. 

233. 

250. 

243- 

175- 

200 . 

240. 

185. 

213. 

180. 

208. 

386. 

236. 

230. 

188. 

200. 

212. 

193- 

230. 

169. 

181 . 

I89. 

180. 

297. 

232. 

150. 

239- 

178. 

242. 

323. 

168. 

197. 

417. 

172. 

240. 

191. 

217. 

208. 

220. 

191. 

119. 

171. 

179. 

208. 

180. 

254. 

191. 

176. 

283. 

253. 

220. 

268. 

248. 

245. 

171. 

239. 

198. 

247. 

219. 

159. 

200. 

233. 

232. 

189. 

237. 

319. 

171. 

194. 

244. 

236. 

260. 

254. 

250. 

196. 

298. 

306. 

175. 

251. 

255- 

285. 

184. 

228. 

171. 

229. 

195. 

214. 

221. 

204. 

276. 

165. 

211. 

264. 

245. 

227. 

197. 

196. 

193. 

211 . 

185. 

157. 

224. 

209. 

223- 

278. 

251. 

140. 

197. 

172. 

174. 

192. 

221. 

283. 

178. 

185. 

181. 

191. 

185. 

206. 

210. 

226. 

219. 

215. 

228. 

245. 

186. 

242. 

201. 

23». 

179. 

218. 

279. 

234. 

264. 

237- 

162. 

245. 

191. 

207. 

248. 

139. 

246. 

247. 

193. 

332. 

194. 

195. 

243- 

271. 

197. 

242. 

175- 

138. 

244. 

206. 

191. 

223. 

172. 

190. 

144. 

194. 

105. 

201. 

193. 

262. 

211. 

178. 

331. 

235. 

267. 

227. 

243- 

261. 

185. 

171. 

222. 

231. 

258. 

211. 

249. 

209. 

177. 

165. 

299- 

274. 

219. 

233. 

220. 

348. 

194. 

230. 

250. 

173. 

260. 

258. 

131. 

168. 

208. 

287. 

308. 

227. 

168. 

178. 

164. 

151  • 

165. 

249. 

258. 

194. 

140. 

187. 

171. 

221. 

294. 

167. 

208. 

208. 

185. 

159- 

222. 

266. 

217. 

249. 

218. 

245. 

242. 

262. 

169. 

204. 

184. 

206. 

198. 

242. 

189. 

260. 

199- 

207. 

206. 

210. 

229. 

232. 

267. 

228. 

187. 

304. 

140. 

209. 

198. 

270. 

188. 

160. 

218. 

257. 

259. 

139. 

213. 

178. 

172. 

198. 

222. 

238. 

273. 

131. 

233- 

269. 

170. 

149. 

194. 

142. 

218. 

194. 

252. 

184. 

203. 

239- 

232. 

225. 

280. 

185. 

163. 

216. 

concentrations,  but  that  a  significant  number  corresponding  to 
contours  in  the  region  of  M2  could  be  classified  as  diseased.  An 
interpretation  was  then  given  to  explain  the  effect  of  triglyceride 
concentrations  in  ascertaining  the  presence  of  coronary  artery 
disease,  "over  and  above  that  implied  by  the  co-existing  levels  of 
plasma  cholesterol  alone."  In  their  analysis,  Scott,  e£  al . .  chose  to 
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considerations  (number  of  standard  deviations  from  the  mean).  The  ^ 

corresponding  CHL  values  then  must  be  eliminated  in  the  bivariate 
analysis.  J 

For  this  data  set  (data  sec  H)  ,  a  BISAM  analysis  was  carried  out 
for  the  normal  and  diseased  groups.  Scatter  plots  for  these  two  1 

groups  appear  in  Figures  30  and  31*  and  several  density  estimates  were  £ 

computed  to  obtain  the  shapes  depicted  in  Figures  32  through  36.'  The 
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Table  21. 

Summary  of  BISAM  Output  for  Coronary  Data  - 

Normal 

Coefficients 

for  Bivariate  Dependence  Density 

(1  outlier 

omi tted) : 

NU1  NU2  REAL  (COF) 

1MAG (COFl 

0  -1  -0.1352 

-0.0478 

0  1  -0.1352 

0.0478 

- 

1  0  -0.1061 

0.0630 

1  0  -0.1061 

-0.0630 

- 

1  1  0.0315 

-0.0563 

1  -1  0.0315 

0.0563 

- 

1  -1  -0.1071 

0.0676 

1  1  -0.1071 

-0.0676 

Integrating  Factor  ■  1.015 

CHL  Mean 

195-14 

TRG  Mean 

140.33 

CHL  Median 

195-14 

TRG  Median 

120.00 

CHL  Variance 

1 308 . 38 

TRG  Variance 

5504.91 

CHL  AR  Order 

0 

TRG  AR  Order 

2 

The  following 

are  computed  with 

one  outlier  omitted: 

Pearson  r 

0.188 

H (d- 1 » Ida) 

-0.026 

Spearman  Rho 

0.243 

H(d8) 

-0.099 

Kendall  Tau-A 

0.166 

H  (d24) 

-0.116 

Kendall  Tau-B 

0.167 

H  (d48) 

-0.681 

univariate  density-quantile  plots  appear  in  Figures  37  through  40. 

One  rejects  normality  for  both  TRG  data  sets.  For  this  analysis,  both 
normal  and  diseased  groups  are  classified  as  bimodal  with  modes  for 
the  normal  group  at  (190,97)  and  (206,146),  and  modes  for  the  diseased 
group  at  (187,120)  and  (221,145).  The  two  modes  for  the  diseased 
group  support  the  results  of  Scott,  e,t  aj..,  but  the  normal  results  are 
contradictory.  This  analysis  was  performed  for  the  complete  data, 
with  bimodal  TRG  densities. 

Upon  eliminating  the  outliers  suggested  above,  It  was  discovered 


Table  22.  Summary  of  BISAM  Output  for  Coronary  Data  -  Diseased 


Coefficients  for  Bivariate  Dependence  Density  (1  outlier  omitted) 


NU1 

NU2 

REAL  (COF) 

IMAG (COF) 

0 

-1 

-0.0489 

0.0546 

0 

1 

-0.0489 

-0.0546 

-1 

0 

-0.0333 

0.0221 

1 

0 

-0.0333 

-0.0221 

-1 

1 

0.1024 

-0.0579 

1 

-1 

0.1024 

0.0579 

-1 

-1 

-0.0148 

0.0472 

1 

1 

-0.0148 

-0.0472 

0 

-2 

0.0441 

-0.0498 

0 

2 

0.0441 

0.0498 

-2 

0 

-0.0198 

-0.0026 

2 

0 

-0.0198 

0.0026 

-1 

2 

0.0831 

-0.0029 

1 

-2 

0.0831 

0.0029 

-2 

1 

-0.0049 

0.0821 

2 

-1 

-0.0049 

-0.0821 

-2 

-1 

-0.0377 

-0.0252 

2 

1 

-0.0377 

0.0252 

-1 

-2 

-0.0411 

-0.0269 

1 

2 

-0.0411 

0.0269 

-2 

2 

0.0560 

-0.0118 

2 

-2 

0.0560 

0.0118 

-2 

-2 

0.0626 

0.0230 

2 

2 

0.0626 

-0.0230 

Integrating  factor  -  1.009 


CHL  Mean  216.19 

CHL  Median  212.50 

CHL  Variance  1850.04 

CHL  AR  Order  0 

The  following  are  computed  with 
Pearson  r  0.210 

Spearman  Rho  0.270 

Kendall  Tau-A  O.I83 

Kendall  Tau-B  0.184 


TRG  Mean  179-35 
TRG  Median  150.00 
TRG  Variance  10372.6 
TRG  AR  Order  1 
one  outlier  omitted: 

H  (d-t i Ida)  -0.300 

H(d8)  -0.078 

H(d24)  -0.105 

H(d48)  -0.158 


that  the  rekults  for  the  diseased  group  were  fairly  stable,  but  for 
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Figure  27.  Three  Dimensional  Plot  of  Density-Quantile  for 
Normal  Data  with  Rho»0,  0rder»8 
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Figure  30 .  Scatter  Plot  of  Coronary  Data,  Normal 


the  normal  group,  a  unimodal  shape  was  obtained  corresponding  to  that 


of  Scott,  e£  al ..  For  this  case,  a  mode  for  the  normal  group  occurs  at 
095>I22)  and  two  modes  for  the  diseased  group  occur  at  088,120)  and 
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Figure  31*  Scatter  Plot  of  Coronary  Data,  Diseased 


(221, 1*5).  These  results  seem  to  confirm  those  of  Scott,  et  a].., 
Elimination  of  the  two  outliers  served  to  produce  unimodal  univariate 
densities  in  all  cases.  Figures  41  through  45  illustrate  the 
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Figure  32.  Contour  Plot  of  Density-Quantile  for  Coronary  Data 

Normal,  Order-8 


Figure  33.  Three  Dimensional  Plot  of  Density-Quantile  for 
Coronary  Data,  Normal,  0rder»8 
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Figure  j 

)7>  Univariate  Oensi ty-Quanti le  Plot  for 

Variable  CHI,  Normal 

bivariate  shapes  obtained  when  one  outlier  is  removed. 

This  analysis  illustrates  some  important  points.  Bimodal 
univariate  densities  will  usually  produce  multimodal  bivariate 
densities,  but  it  is  possible  for  the  dependence  structure  for  two 
unimode)  univariate  densities  to  induce  multimodal  bivariate 
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Figure  }8.  Univariate  Density-Quantile  Plot  for 

Variable  TRG,  Normal 

densities.  The  ability  of  the  kernel  method  and  the  minii 
information  method  to  discover  these  modes  is  crucial  to  a  study 
designed  to  determine  if  more  than  one  population  is  represented  by  a 
data  set.  Furthermore,  this  analysis  illustrates  the  importance  of 
univariate  density  estimation  in  such  "bump-hunting"  problems.  When 


208 


U  fo 


0  0099  0  0005 
0  0317  0  0014 
0  0535  O  0024 
0  0752  O  0034 
0  0970  O  0045 
O  1188  O  0057 
0  1406  O  0066 
0  1624  0  0079 
O  1*42  0  008* 
0  2059  0  0096 
0  2277  0  0103 
0  2495  00106 
0  2713  O  0108 
0  2931  0  0107 
0  3149  0  0105 
0-3366  0.0101 
0  3584  0  0096 
0  3802  O  009 1 
0  4020  O  0086 
0  4238  O  0082 
O  4455  O  0079 
0.4673  0.0077 
0  4091  0  0077 
0  51C9  0.0079 
0  5327  0.0081 
0.5545  0  0085 
O  5762  0.0089 
0.5900  0  0093 
O  6198  O  0097 
O  6416  0  0099 
O  6634  0  OlOO 
O  685  1  O  0099 
0.7069  0  0096 
0  7207  0.0091 
O  7505  O  0005 
0  7723  O  0077 
0.7941  0  0069 
O  M5*  O  0060 
0.0376  0  0051 
O  8594  O  0043 
O  8012  O  0036 
0  9030  0  0029 
O  9240  0  0023 
O  9465  0  0017 
0  9683  0  0012 
0  9901  0  0004 


O  0004 


Figure  39- 


Univariate  Density-Quantile  Plot  for 
Variable  CHL,  Diseased 


the  correlation  is  small,  the  univariate  densities  may  dominate  the 
shaping  of  the  bivariate  densities.  For  the  coronary  data,  the 
correlations  range  from  0.2  to  0.3  indicating  small  but  statistically 
significant  correlation  between  the  variables  CKL  and  TRG. 
Consequently,  the  univariate  density-quantile  functions  have  a 
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Figure  40.  Univariate  Density-Quantile  Plot  for 

Variable  TRG,  Diseased 

pronounced  effect  on  the  bivariate  density-quantile,  especially  for 
the  normal  group.  When  the  outlier  is  omitted  from  the  normal  group, 
the  shape  of  the  TRG  density  changes  from  bimodal  to  uni  modal  with  a 
corresponding  change  in  the  bivariate  density-quantile.  Clearly  a 
univariate  analysis  is  crucial  to  bivariate  data  modeling,  a  fact  that 
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Figure  1*5.  Three  Dimensional  Plot  of  Density-Quantile  for 
Coronary  Data,  Diseased,  One  Outlier  Omitted,  0rder>24 
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may  be  hidden  by  the  kernel  approach. 

These  examples  serve  to  illustrate  the  competitiveness  of  minimum 
information  bivariate  density  estimation  and  point  out  some 
fundamental  character i st i cs  of  a  bivariate  data  modeling  approach. 

For  any  exploratory  analysis  one  is  especially  concerned  with 
obtaining  all  of  the  possible  shapes  that  can  effectively  model  a  data 
set.  The  minimum  information  approach  provides  such  a  multitude  of 
shapes  with  a  minimum  amount  of  expurgator ial  effort  that  occurs  in 
such  c^proaches  as  the  kernel  method.  However,  the  approach  would  be 
aided  by  an  objective  procedure  to  choose  among  the  orders  of 
approximation.  This  is  left  as  a  subject  for  further  study  with  the 
failures  noted  herein  serving  as  a  catapult  for  future  research. 
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7.  CONCLUSION 

7.1  Concluding  Remarks 

In  the  dissertation  we  have  shown  the  utility  of  the  function 
approximation  approach  to  density  estimation.  The  representation  of 
an  unknown  density  as  a  truncated  parametric  model  that  is 
nonparametr i c  in  its  assumptions  for  the  data  provides  a  useful  tool 
for  an  exploratory  analysis  of  a  data  set.  The  convergence  of 
infinite  series  representations  is  usually  very  rapid  so  that 
truncated  series  provide  good  approximations  for  functions  of 
interest.  The  statistical  problem  involves  estimating  the  parameters 
of  the  expansion  in  a  stochastic  setting.  Unfortunately,  an  exact 
stochastic  model  to  aid  in  estimation  is  not  always  available,  and 
asymptotic  results  may  only  be  meaningful  for  very  large  samples. 
Consequently,  the  properties  of  estimates  are  difficult  to  investigate 
in  such  a  general  setting  and  hence  restrict  the  applicability  of  such 
approaches  to  inferential  statistics.  Simulation  studies  and 
experience  in  the  application  of  expansion  techniques,  however, 
support  their  use  in  the  absence  of  exact  theory. 

Heuristic  motivations  are  given  by  considering  finite  exponential 
models  and  their  relationship  to  Fourier  series  in  Hilbert  space. 
Analogies  to  classical  regression  and  time  series  analysis  also 
provide  heuristic  support  to  this  approach.  The  development  of 
information  functionals  then  serve  to  promote  the  regression  analogy 
by  suggesting  least  squares  estimation  of  parameters.  Information 


theory  plays  a  dual  role  in  providing  criterion  functionals  and 
parameters  that  represent  important  population  characteristics.  One 
suspects  that  the  observation  of  Berkson  (1980)  concerning  the 
selection  of  estimation  criteria  have  a  more  general  interpretation  in 
terms  of  information  functionals.  The  value  of  information  theory  as 
apclied  to  statistics  has  yet  to  be  fully  realized,  but  works  such  as 
this  one  should  motivate  furth*  research  into  the  area. 

In  this  work  we  have  presented  some  fundamental  mathematical  and 
statistical  concepts  useful  to  the  stuay  of  bivariate  data  modeling. 
The  analysis  of  a  variety  of  nonparametr i c  density  estimation 
procedures  suggested  problems  to  be  overcome  and  motivated  the 
development  of  a  new  technique  that  competes  with  existing  procedures 
and  extends  easily  to  the  bivariate  case.  Classical  inference 
procedures  for  testing  for  independence  were  explored  with  suggestions 
made  for  the  development  of  entropy  statistics  that  measure 
association  between  two  random  variables.  An  information  parameter 
was  derived  that  represented  a  measure  of  association,  but 
satisfactory  estimators  of  the  parameter  were  not  obtained.  A 
bivariate  data  modeling  approach  was  investigated  and  promising 
r«9uits  obtained  for  the  problem  of  density  estimation  and 
"bump-hunting"  in  a  multivariate  setting.  The  development  of 
univariate  and  bivariate  density  estimation  programs  provided 
interesting  comparisons  and  permitted  the  exploratory  analysis  of  a 
variety  of  data  sets.  The  program  BISAW  permits  application  of  the 
data  modeling  results  obtained  and  provides  a  useful  computing  tool  to 
the  applied  statistician.  The  8 1  SAM  program  forms  a  computing  triad 
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with  ONESAM  (Parzen  and  Anderson,  1 980)  and  TWOSAM  (Prihoda,  1 98 1 ) 
that  may  serve  as  a  valuable  tool  to  the  data  analyst.  The 
nonparametr ic  and  robust  procedures  availably  from  these  programs  are 
difficult  to  obtain  from  one  source  and  overcome  the  weaknesses  of 
most  statistical  packages  in  the  areas  of  data  analysis  and  density 
estimation.  This  contribution  should  motivate  the  application  of 
nonparametr ic  procedures  to  statistical  problems  and  should  support 
the  continuing  research  into  multivariate  nonparametr i c  methods. 

7.2  Problems  for  Further  Study 

The  entropy  estimates  of  section  4.5  were  disappointing  in 
comparison  with  existing  measures  of  association,  but  it  is  felt  that 
better  estimates  of  parameter  H (d)  will  make  this  an  important 
approach  in  testing  for  independence.  Furthermore,  entropy 
diagnostics  should  be  valuable  tools  to  many  areas  of  bivariate  data 
modeling.  For  example,  goodness-of-f i t  tests  are  readily  suggested  by 
examining  the  information  between  a  null  hypothesis  density  and  a 
nonparametr ic  density  estimate.  The  utility  of  information 
functionals  spreads  across  many  areas  of  application.  A 
characterization  problem  might  be  aided  by  information  functionals, 
with  an  open  research  question  being  whether 

l(fx,Y;fxV  "  >°gd-p,>  (9.2.0 

is  a  defining  characteristic  of  bivariate  normality  or  whether  it 


defines  a  more  general  class  of  distributions.  If  this  equation 
characterized  bivariate  normality,  an  entropy  based  test  of  bivariate 
normality  could  be  developed  using  Pearson's  r  to  estimate  the  normal 
entropy  and  using  the  estimated  dependence  density  to  form  an  entropy 
statistic.  The  difference  of  these  statistics  would  then  be  a  version 
of  the  information  divergence  discussed  in  section  2.5* 

The  use  of  subjective  criteria  ir  model  selection  should  motivate 
the  study  of  objective  techniques  for  choosing  a  model  over 
competitors.  One  is  motivated  to  consider  a  function  of  the  sample 
entropy  to  evaluate  the  contribution  of  additional  terms  in  en 
expansion.  A  version  of  maximum  entropy  analagous  to  the  AIC  or  CAT 
criterion  functions  could  then  be  developed  to  arrive  at  an  optimal 
order  of  approximation. 

The  distribution  theory  for  entropy  statistics  remains  to  be 
investigated,  with  motivation  provided  by  recent  papers  by  Stute 
(1982)  and  Taniguchi  (1980) .  The  asymptotic  theory  in  the  univariate 
case  for  the  minimum  information  density  estimate  needs  to  be 
resolved,  indicating  the  need  for  continued  research  into  the 
distribution  theory  of  non-standard  regression  models. 

Multivariate  robustness  is  also  a  topic  of  interest,  with  papers 
by  Gnanadesikan  and  Kettenring  0972)  and  Green  ( 1 98 1 )  suggesting 
research  questions  for  the  problem  of  trimming  a  multivariate  data  set 
of  outliers.  The  computer  implementation  of  data  trimming  procedures 
is  also  of  interest. 

Finally,  one  is  interested  in  discovering  bivariate  distributions 
for  which  the  classical  nonparametr ic  measures  of  association  perform 
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poorly.  Further  research  may  then  be  motivated  into  the  numerical 
approximation  of  conditional  quantile  functions  of  interest  to  be  used 
in  simulation  studies.  The  problem  of  simulating  data  corresponding 
to  existing  data  as  a  validity  check  of  an  analysis  has  been 
approached  from  a  conditional  quantile  perspective,  implying  the  need 
for  good  estimates  of  the  conditional  quantile  function.  Such 
estimators  have  yet  to  be  proposed,  but  one  may  approach  the  problem 
using  the  function  approximation  techniques  suggested  by  this  work. 
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APPENDIX  A 


A. I  Subprogram  Ml  DEN 


SUBROUT  I NE  M I  DEN (X , N , A , B , CAPT , I  MOD) 

C  ************************************************************* 

c 

C  SUBROUTINE  TO  PERFORM  NONPARAMETR I C  DENSITY  ESTIMATION  USING 
C  LEAST  SQUARES  REGRESSION  MINIMUM  INFORMATION  TECHNIQUE. 

C  LEGENDRE  POLYNOMIALS  ARE  EMPLOYED  FOR  THE  ORTHOGONAL  EXPANSION 
C 

C  INPUT:  X.N  -  DATA  AND  SAMPLE  SIZE 
C  A, B  -  MIN  AND  MAX  DATA  VALUES 

C  CAPT  -  LABEL  FOR  X 

C 

C  OUTPUT:  PLOTS  AND  DESCRIPTIVE  STATISTICS 
C 

C  SUBPROGRAMS  CALLED:  PLOTXY , FTERP , MAX , M I N , 

C  SEQREG , LEGP,NNDEN,SWEEPD,RELMI N.CLPLT1 

C 

C************************************************************* 

DIMENSION  CAPT (20) ,X(N) ,Y(150) ,FNN(150) ,FMI (150) 

DIMENSION  CBAR03)  .  A I C  ( 1 3) 

DIMENSION  THETA (12) ,P(3,12) ,COF  (12,12) , IORD  (12) .BEST (78, 3) , 
+RVAR (12), NVAR ( 1 2)  , COF  F ( 1 1 ) , MORD  (5) , XNORD (5) 

DOUBLE  PRECISION  COV(13,13) 

DATA  NA I  C/4H  A I  C/ 

DATA  NTHT/4HTHET/ 

DATA  NCOFAH  COF/ 

DATA  NAMX/4H  X  / 

DATA  NAMFN/4H  FNN/ 

DATA  NAMFMAH  FMI / 

DATA  NRVRAHRVAR/ 

DATA  lORD/2,3.4.5.6.7.8,9,10,11,12,1/ 

DATA  MORD/2, 3,6, 11.8/ 

DATA  XNORO/4H-2  ,4H-3  ,4H-6  ,4H-11  .4H-AIC/ 

DATA  XMN/4HM (N)  / 

CAPT  (13) "XMN 
XN-FLOAT (N) 

MN-10 

RANGE-B-A 

IDEG-0 

CALL  LEGP (12, P, I  DEG) 

DO  40  K-1,3 
DO  40  J-1,12 
COF  (K,J)-P(K,J) 

40  CONTINUE 
DO  50  K-4,12 
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CALL  LEGP(12,P, I  DEG) 

DO  50  J-1,12 
COF  (K,J)-P(3,J) 

CONTINUE 

00  240  t TER T ■=  1  ,2 
DO  25  1=1 .N 
Y  (I)  -X  (I) 

CONTINUE 

IF  (ITER1 .EQ.2)  MN-20 

CALL  NNDEN(X,Y,N,N,MN, RANGE, FNN) 

CALL  PLOTXY (Y , FNN , 100, C APT , NAMX.NAMFN , 1) 

DO  55  I  = ) *  N 

FNN  (DIALOG  (FNN  (I)) 

CONTINUE 
DO  90  K* 1,12 
CBAR (K) *0.0 
DO  80  J-l.K 
C0V(K,J)-0.0 
DO  70  1  =  1,  N 
ARGK-COF  (K,  1) 

ARGJ-COF  (J,  1) 

IF (K.EQ.l)  GO  TO  6l 
DO  60  L-2.K 

ARGK-ARGK+COF  (K ,  L)  *  ( (2 .0*  (X  ( I ) -A) /RANGE- 1 .0)  ** (L- 1 ) ) 
ARGJ-ARGJ+COF (J , L) *  ( {2 .0*  (X  ( I ) -A) /RANGE- 1 .0) ** (L- 1 ) ) 
CONTINUE 
CONTINUE 

IF(J.EQ.l)  CBAR  (K) -CBAR  (K)+ARGK 
COV  (K , J) =COV (K , J) +ARGK*ARG J 
CONTINUE 

IF(J.EQ.l)  CBAR  (K) -CBAR  (K) /XN 

COV  (K , J) -COV (K , J)  /XN-CBAR (K)  *CBAR (J) 

IF(K.NE.J)  COV  (J ,  K)  -COV  (K ,  J) 

CONTINUE 
CONTINUE 
CBAR  (13)  -0.0 
DO  120  J— 1,12 
C0V(13.J)«0.0 
DO  110  1-1, N 
ARGJ-COF  (J, 1) 

DO  100  L-2.J 

ARGJ-ARGJ+COF (J , L) * ( (2 . 0*  (X  ( I ) -A) /RANGE -1.0)**  (L-l) ) 
CONTINUE 

I F  (J .  EQ .  I)  CB AR  < 1 3)  -CBAR  ( 1 3)  +FNN  ( I ) 

COV  (1 3 .  J)  -COV  ( 1 3 ,  J)  +ARG J*FNN  ( I ) 

CONTINUE 

IF  (J  .EQ.  1)  CBAR  (13) -CBAR  (1 3) /XN 

COV  ( 1 3 , J) -COV ( 1 3 , J)  /XN-CBAR (J)  *CBAR (13) 

COV  (J ,  1 3)  -C0V03.J) 

CONTINUE 
COV  (13, 13)  -o.o 
DO  130  1-1, N 
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COV  (13.  13)  -C0VO3,  13)+FNN  (I)  *FNN  (I) 

130  CONTINUE 

COV  (1 3, 1 3) -COV  (1 3. 1 3) /XN-CBAR (13)  *CBAR (1 3) 

DO  HO  1-1  ,N 
FNN  (I)  -EXP  (FNN  (I) ) 

HO  CONTINUE 

CALL  SEQREG(COV, 1 3. 1 2 , I ORD, BEST, 78, RVAR, NVAR.N I VN) 

CALL  CLPLT1 (RVAR.NI VN, 1 .NRVR.41 , 1) 

NV1-NI VN+1 
AIC(l)—  1.0/XN 
A I C (2) -ALOG (RVAR  (1) ) +2 .0/XN 
DO  150  I-2.NIVN 

A I C  ( I  + 1 ) -ALOG (R VAR  ( I ) ) +2 . 0*FL0AT ( I )  /XN 
150  CONTINUE 
DEC-1 ./XN 

CALL  RELMI N (A  I C ,^V1 , DEC ,MI N 1 ,MI N2 ,NA I C) 

MINI-MINI-1 
MIN2-MIN2-1 
MORD  (5) -MINI 
DO  230  ITER-1,5 
KORD-MORD  (ITER) 

CAPT(H)  -XNORD  (ITER) 

K-NVAR (KORD) 

DO  160  1-1, KORD 
THETA  (I ) -BEST (K, 2) 

K-K+l 

160  CONTINUE 

CALL  CLPLT1  (THET/5  KORD, 1 ,NTHT,4l , 1) 

DO  180  K-2.K0R0 
COFF  (K-l)-O.O 
DO  170  J-2.K0R0 

COFF  (K-l)-COFF  (K-1)+C0F (J,K) *THETA (J-l) 

170  CONTINUE 
180  CONTINUE 

CALL  CLPLT1 (COFF .KORD-l , 1 ,NCOF ,41 , 1) 

CALL  EQSPY (A,B, 100,Y) 

F SUM-0.0 
FMODE-O. 

XM0DE-999. 

FMEAN-0.0 
F VAR-0.0 
DO  200  1-1,100 
FMI  (l)-O.O 
DO  190  L-2.K0RD 

FMI  (l)-FMI  (D+COFF  (L-1)*((2.0*  (Y  ( I )  -A)  /RANGE-1 .0)  **  (L-l) ) 
190  CONTINUE 

FMI ( I ) -EXP (FMI (I)) 

FSUM-FSUM-FFMI  (I) 

FMEAN-FMEAN+Y (I) *FMI  (I) 

FVAR-FVAR+Y (I)  *Y (I) *FMI (I) 

IF  (FMODE.GT.FMI (I))  GO  TO  200 
FMODE-FMI (I) 
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XMODE-Y(I) 

200  CONTINUE 

FSUM-FSUM*RANGE/100.0 

FMEAN-FMEAN*RANG£/(FSUM*100.0) 

FMOOE-FMOOE/FSUM 

FVAR-FVAR*RANG£/(FSUM*100.0) 

FVAR«FVAR-FMEAN*FMEAN 

DTE ST-0.0 

00  210  1-1, N 

FMI  (l)-FMI  (D/FSUM 

OTEST-OTEST+ABS  (FMI  (l)-FNN(l)) 

210  CONTINUE 

OTEST-OTEST/XN 
WRITE  (6,220)  DTE ST 

220  FORMAT  (/,  10X, ’AVE  OF  SUM  OF  ABS  (FMI -FNN) F  10.1*,/) 

WRITE  (6,284) 

284  FORMAT (1H1, 7 (/)) 

WRITE  (6,290)  FMEAN, FVAR, FMODE , XMODE , FSUM 
290  FORMAT  (///,  13X,  ‘  +  1 . 108OH-)  ,  '  +  ' ,/,  1 3X ,  '  |  PARAMETER', 

+'FUNCTIONALS:  MEAN  -'.F10.4, VARIANCE  -' ,F10. 4, 1 ,  MODE 
+F10.4,'  AT  X  -  1 , F 10.4, '  |',/,13X,'|  INTEGRATING  FACTOR 

+F10.4.75X, 'I',/, 13X.'  +  ', 108  (1H-), '  +  ',/) 

CALL  PLOTXY (Y, FMI , 100.CAPT ,NAMX,NAMFM,  1) 

230  CONTINUE 
240  CONTINUE 
RETURN 
END 


A. 2  Subprogram  CMPDEN 


SUBROUTINE  CMPDEN (W,N, CAPT, THETA, I NDW.NVW, I  SORT) 

C*****rt ********************************************************* 

C 

C  THIS  SUBPROGRAM  COMPUTES  A  SMOOTHED  DENSITY  QUANTILE 
C  FUNCTION  BASED  ON  ORTHOGONAL  EXPANSION  IN  TERMS  OF 
C  COMPLEX  EXPONENTIALS.  A  NEAREST  NEIGHBOR  ESTIMATE  IS 
C  OBTAINED  AS  A  RAW  DENSITY  QUANTILE  AND  THEN  A  SEQUENTIAL 
C  REGRESSION  ROUTINE  IS  APPLIED  WITH  THE  RAW  D-Q  TREATED 
C  AS  A  DEPENDENT  VARIABLE.  A  COMPLEX  SWEEP  OPERATOR  IS  THEN 
C  USED  TO  OBTAIN  COEFFICIENTS  AND  RESIDUAL  VARIANCES  FOR 

C  VARIOUS  ORDERS.  AND  FINALLY  PARZEN'S  CAT  CRITERION  IS 

C  USED  TO  CHOOSE  THE  'BEST'  EXPANSION. 

C 

C  INPUT:  W  -  RAW  DATA 

C  N  -  SAMPLE  SIZE 

C  (SORT  -  0  IF  W  SORTED,  1  OTHERWISE. 

C 

C  OUTPUT:  NVW  -  ORDER  OF  EXPANSION  CHOSEN  (NUMBER  OF  INDEP.  VAR.) 
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C 


C  THETA  -  COEFFICIENTS  IN  EXPANSION 

C  INOW  -  VECTOR  OF  INDICES  CORRESPONDING  TO  COEFFICIENTS 

C  CHOSEN 

C 

C  SUBROUTINES  CALLED:  CSQREG, CSWEEP 
C 

C *************************************************************** 

DIMENSION  W  (N) ,FMC{200) .THETA (21) . I  NOW (21) .BEST (231, 3) . 

+RVAR  (21) ,NVAR (21) , IORDI (21) , I ORD  (21) , FNN (200) ,T(200) ,FHO  (200) 
DIMENSION  CAPT (20) , MORD (3) , CORD  (3) 

COMPLEX  PHI (22,22) ,PHIBAR(21) ,ZARG,CEXP,CONJG,CMPLX 

DATA  I ORD I /I 1, 10, 12,9, 13. 8, 14, 7, 15. 6. 16, 5, 17,1*. 18. 3. 19.2. 20, 1,21/ 

DATA  LVAR, LTHT/4HRVAR, 4HTHET/ 

DATA  NAMX/4H  X/ 

DATA  NAMFN.NAMFC/4H  FNN.4H  FMC/ 

DATA  NAMM/4H  FHO/ 

DATA  MORD/2,8,15/ 

DATA  CORO/4H-2  ,4H-8  ,4H-15  / 

DATA  CMN/4HM (N)  / 

CAPT (13) -CMN 
DO  1  1-1,21 
I ORD  (I ) -IORDI  (I) 

1  CONTINUE 
DO  5  1-1, N 
T ( I )  -W  ( I ) 

5  CONTINUE 
N2-N-2 
M-2 1 
Ml-M+1 
MID-1 1 

TW0PI-8.0*ATAN(1.0) 

DENOM-1 ./FLOAT (N+l) 

IF  (ISORT.EQ.O)  GO  TO  10 
CALI  GRD  (T ,  N) 

CALL  ORD  (W , N) 

10  CONTINUE 

A-T  (1) -DENOM 
B-T (N) +OENOM 
RANGE-B-A 

COMPUTE  NEAREST  NEIGHBOR  DENSITY  ESTIMATE 

DO  200  I  TER  1—1,2 
MN-8 

IF  (ITER1.EQ.2)  MN-lj 
CAPT  (14)  -CORD  (2) 

IF (ITER! .EQ.2)  CAPT (14) -CORD (3) 

CALL  NNDEN(T,W,N,N,MN, RANGE, FNN) 

CALL  PLOTXY  (T , F NN , N , CAPT , NAMX , NAMFN, 1 ) 

FBAR-0.0 
DO  40  1-3, N2 
FBAR-FBAR+FNN ( I ) 
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kO  CONTINUE 

FBAR-F BAR/FLOAT (N-4) 

DO  60  1=1, M 
I  I  =  I -Ml D 

PH  I  BAR  ( I ) =CMPLX  (0 . , 0 . ) 

DO  50  K=3.N2 

ARG-TWOP I *F  LOAT { I  I ) * (T (K)  -A)  /RANGE 
ZARG=CMPLX (0 . , ARG) 

PH  I  BAR  ( I ) *PH I  BAR ( I ) +CEXP (ZARG) 

50  CONTINUE 

PH  I  BAR  ( I ) =PH I  BAR  ( I ) /F  LOAT (N-4) 

60  CONTINUE 

DO  90  1=1, M 
DO  80  J-l.l 
I l-l-MID 
JJ-J-MID 

PHI  (I , J) *CMPLX  (0. ,0.) 

DO  70  K-3.N2 

ARG=TWOP I  * (F  LOAT (II) -FLOAT (JJ) ) *  (T  (K) -A) /RANGE 
ZARG=CMPLX (0. , ARG) 

PHI  (I .J) -PH  I (I , J) +CEXP (ZARG) 

70  CONTINUE 

PHI  ( I . J) -PH  I  ( I , J)  /FLOAT (N-4) -PK I  BAR ( I ) *CON JG  (PH  I B AR  ( J) ) 
PHI  (J  ,  I )  =CONJG  (PH  I  (I  ,J)) 

80  CONTINUE 
90  CONTINUE 

DO  110  1=1, M 
I l-l-MID 

PHI  (Ml , I) “CMPLX  (0. ,0.) 

DO  100  K-3.N2 

ARG— TWOP I  *F LOAT  ( I  I )  *  (T  (K)  -A)  /RANGE 
ZARG»CMPLX (0. , ARG) 

PHI  (Ml. I ) -PH  I  (Ml , I ) +FNN  (K) *CEXP  (ZARG) 

100  CONTINUE 

PHI  (Ml,  l)-PHI  (Ml ,  I)  /FLOAT  (N-A)  -FBAR*CONJG  (PH  I  BAR  ( I ) ) 

PHI  (I.Ml)-CONJG(PHI  (Ml ,  I)) 

110  CONTINUE 

PHI  (Ml , Ml ) -CMPLX  (0 ,  ,0 .) 

DO  120  K-3.N2 

PHI  (Ml.Ml)-PHI  (Ml  ,M1)+FNN  (K)  *FNN  (K) 

120  CONTINUE 

PHI  (Ml. Ml) -PHI (Ml ,M1) /FLOAT (N-A) -FBAR*FBAR 
CALL  CSQREG (PHI ,22,M, IORD, BEST, 231 , RVAR , NVAR , N I VN) 

CALL  CLPLT1  (RVAR. Nl VN,  1  ,LVAR, Al , 1) 

NV1-NIVN+1 
DO  190  ITER-1,3 
NVW-MORD (ITER) 

CAPT (16)  -CORD (ITER) 

K1-NVAR (NVW) 

00  I AO  K-l.NVW 

1N0W(K)-I  FIX  (BEST  (Kl,l)+0.5) 

THETA  (K)-BCST  (K1,2) 
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Kl-Kl+1 
1*0  CONTINUE 

WRITE  (6, 150) 

150  FORMAT  (1H1) 

CALL  CLPLT1 (THETA, NVW, 1 ,LTHT,*1 , 1) 

FSUM*0.0 
DO  170  1*1,50 
FMC  (I) *0.0 

T  ( I ) -A+F LOAT ( I ) *RANGE/50 . 0 
DO  160  K*1 ,NVW 

ARG-TWOP 1 *F  LO AT  ( 1  NOW (K) - 11) *  (T  ( I ) - A) /RANGE 
FMC ( I ) *FMC ( I ) +THETA (K) *C0S (ARG) 

160  CONTINUE 

FMC  ( I ) *EXP (FMC ( I ) ) 

FSUM*FSUM+FMC (I) 

170  CONTINUE 

FSUM-FSUM*RANGE/50.0 

DO  180  1-1,50 

FMC  (I) »FMC (I) /FSUM 

FHO  (I ) *. 5*XN0RM  (0. , 1 . ,T (1) )  +  . 5*XN0RM (3- • -5.T  (I) ) 

180  CONTINUE 

CALL  SED I  AG (FHO , FMC , 50 , RANGE , X I SE , XMAXD , XMSE) 

CALL  PLTXYZ(T, FMC, FHO, 50, CAPT,NAMX,NAMFC,NAMM,XISE, XMAXD, XMSE) 
190  CONTINUE 
200  CONTINUE 
RETURN 
END 


A. 3  Two  Step  FORTRAN-SAS  Program  Duplicating  MIDEN 


In  the  following  listing,  items  appearing  in  lowercase  represent 
options  depending  on  the  system  and  the  intended  application. 


//  job  card 

//  optional  operating  system  cards 

//STEP1  EXEC  FORTX.REGION-512K  <—  one  step  FORTRAN  procedure 

//FT01F001  DO  DSN-WYL. scratch  file  name 

//SYSLIB  DO 

//  DO 

//  DD 

//  OD 

//  DD  name  of  user  subroutine  library  (TIMESBOARD  in  this  case) 
//SOURCE  DD  * 

C 

C  PROGRAM  TO  PERFORM  NONPARAMETR I C  DENSITY  ESTIMATION  USING 
C  LEAST  SQUARES  REGRESSION.  DATA  SET  WRITTEN  FOR  USE  BY  SAS  GLM. 
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C  SEE  WOODFIELD  DISSERTATION  FOR  MORE  INFORMATION. 

C 

DIMENSION  CAPT  (20)  ,  L  (5)  ,X(150)  ,FNN(150)  ,Y(150) 

DIMENSION  THETA (12) ,P (3, 12) ,COF  (12, 12) 

C 

C  READ  DATA  SET 
C 

READ  (5, 10) CAPT 
10  FORMAT  (20AA) 

READ  (5.20)  N.L 
20  FORMAT  (1 5. '♦X, 5AM 

READ  (5.  L)  (X(I),I-1,N) 

DO  25  1-1, N 
Y  (I)  -X  (I) 

25  CONTINUE 
A-X(l) 

B-A 

DO  30  I-2.N 

IF  (X  (I)  .GT.B)  B-X(l) 

IF  (X  ( I )  . LT.A)  A-X(l) 

30  CONTINUE 

A-A-l .O/FLOAT (N) 

B-B+l .O/FLOAT (N) 

RANGE-B-A 

IDEG-0 

CALL  LEGP (1 2, P, I  DEG) 

DO  40  K-1,3 
DO  40  J-l.K 
COF (K , J) -P  (K.J) 

40  CONTINUE 

DO  50  K-4,12 

CALL  LEGP  (12, P, I  DEG) 

DO  50  J-l.K 
COF  (K ,  J)  -P  (3,  J) 

50  CONTINUE 

CALL  NNDEN  (X,Y,N,N, 10.FNN) 

DO  70  1-1, N 
DO  60  K-2,12 
THETA (K) -COF (K, 1) 

DO  60  J-2,K 

THETA  (K) -THETA (K)+COF (K.J) *((2.0* (X (I) -A) /RANGE- 1 .0) **  (J-l)) 
60  CONTINUE 

WRITE  (1,65)  FNN(I)  ,  (THETA  (K)  , K-2,12) 

65  FORMAT(1X,7F10.5,/,1X,5F10.5) 

70  CONTINUE 
STOP 
END 

SUBROUTINE  LEGP  (N,P, I  DEG) 


a 


i 


code  for  subprogram  LEGP 
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SUBROUTINE  NNDEN (X,Y,N,M,MN, FNN) 


code  for  subprogram  NNDEN 


//SYS IN  DD  * 

data  goes  here 

//* 

//STEP2  EXEC  SAS 

//ONE  DD  DSN-WYL.  description  of  tape  1  above  where  output 
of  STEP1  was  written 
DATA  TWO;  INFILE  ONE: 

INPUT  FNN  X1-X6  #2  X7-X11; 

Y-LOG  (FNN) ; 

CARDS; 

PROC  GLM; 

MODEL  Y  ■  variable  listing  for  variables  to  be  included  in  model/  P; 
OUTPUT  OUT-NEW  PREO I CTED-YP; 

DATA  THREE;  SET  NEW; 

FHAT-EXP  (YP) ; 

PROC  PLOT  DATA-THREE;  PLOT  FHAT*X1-'*‘; 


ooooooooooooooo 
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APPENDIX  B 


BlSAM:  A  Program  for  Bivariate  Data  Analysis 


C 

C  PROGRAM  BISAM 
C 

C******************************************************** 

C 

C  DRIVER  PROGRAM  FOR  BIVARIATE  DATA  ANALYSIS 
C 

C  THIS  PROGRAM  PRODUCES  SCATTER  PLOTS,  DESCRIPTIVE  STATISTICS, 

C  AND  CORRELATION  STATISTICS  FOR  A  SET  OF  BIVARIATE  DATA. 

C  BIVARIATE  DENSITY  ESTIMATION  IS  PERFORMED  ON  THE  DEPENDENCE 
C  DENSITY  USING  MINIMUM  Bl - I NFORMAT I  ON  TECHNIQUES. 

C  A  A0X40  GRID  OF  DENSITY  AND  DENSITY  QUANTILE  VALUES  ,S 
C  WRITTEN  TO  TAPES  1  THROUGH  3  FOR  THE  ORDERS  8,  24,  AND  48 
C  TO  BE  USED  FOR  GRAPHICAL  OUTPUT  USING  SAS/GRAPH. 

C  SEE  WOODFIELD  DISSERTATION  FOR  MORE  INFORMATION. 

C 

C  INPUT:  NTAPE  -  TAPE  WHERE  DATA  SET  RESIDES 
C  (X, Y)  -  BIVARIATE  DATA  (INDIVIDUALLY,  X  FIRST) 

MORD  -  MAXIMUM  AUTOREGRESSIVE  ORDER  TO  BE  USED  FOR 
UNIVARIATE  AR  DENSITY  ESTIMATION  (<-6) 

IDQX.IDQY  -  NULL  DISTRIBUTIONS  FOR  AUTOREGRESSIVE  SMOOTHING 
IPLT1  -  0 — >  NO  SCATTER  PLOTS 

1— ->  SCATTER  PLOT  OF  DATA 

2—  >  SCATTER  PLOT  OF  RANK  TRANSFORMED  DATA 

3—  >  BOTH  SCATTER  PLOTS 
IPLT2  -  0—>  NO  AUTOREGRESSIVE  DENSITY  PLOTS 

1— >  BEST  ORDER  AR  DENSITY  PLOTS 
IPLT3  -  0— >  NO  QUANTILE  BOX  PLOTS 

1— >  QUANTILE  BOX  PLOTS  FOR  BOTH  X  AND  Y 
IDST  -  0— >  NO  UNIVARIATE  DESCRIPTIVE  STATISTICS 

1— >  UNIVARIATE  DESCRIPTIVE  STATISTICS  FOR  X  AND  Y 
KDEL  -  MAXIMUM  NUMBER  OF  EXTREME  POINTS  TO  EXCLUDE  FROM 
BIVARIATE  ANALYSIS 
C 

C  SUBPROGRAMS  CALLED:  OATAIN, RANK, 0RD2,PEARSN,SPRMN,PPL0T, TRIM, 

c  kendal,:mpinf,cptent,relmin,min,plotxy,fterp, 

C  MINMAX,CSQREG,CSWEEP,AUTDEN,ORD,QHLIN, 

C  QTOFQ, WSPACE , FOR  I ER , AUTORG , PARZ , AREST, 

C  FQFNC,MDNRI S,QF IND.MAX.CLPLTl , DESTAT.QPLOT, 

C  EXPAND 

C 

C 

C** ****************************************************** 

C 
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COMMON  X  (500)  ,  Y  (500)  .RANKX  (500)  .RANKY  (500) 

DIMENSION  L(5)  ,LABY(20)  ,LABX(20)  ,T(500,2)  ,HD(4) 

DIMENSION  CHAR  (5) , XNAME (20) , YNAME  (20) ,CAPT(20) ,CRNK(6) 

DIMENSION  W(500)  ,SA(I003)  ,GE(1003) 

EQUIVALENCE  (T  (1 , 1) , X  ( 1) ) 

EQUIVALENCE  (T  (1 , 2)  ,  Y  (1) ) 

DATA  N0UT,NIN/6,5/ 

DATA  CHAR/1H*, 1H+, 1HX, 1H#, 1H ./ 

DATA  XNAME/10*1H  , 1HX,9*1H  / 

DATA  YNAME/10*1H  , 1HY,9*1H  / 

DATA  CAPT/4HSCAT.4HTER  ,4HPL0T,4H  OF  ,4HX  VS.4H.  Y  , 14*4H  / 

DATA  CRNK/4H-  RA.4HNK  T.4HRANS ,4H FORM, 4H ATI 0.4HN  / 

WRITE  (NOUT, 1) 
l  FORMAT  0H1) 

READ  (N IN. 10)  NTAPE, IDQX, IDQY.MORD, I PLT1 , 1 PLT2 , I PLT3, IDST.KDEL 
10  FORMAT  (9 1 5) 

WRITE  (NOUT, 20) 

20  FORMAT  (// , 10X , 20  (4H****) ,/, 10X , 1  * 1 , 78X , 1  * 1  ,/, 10X , 1  *  Bl SAM  1  , 

+  '-  BIVARIATE  DATA  ANALYSIS  USING  FOURIER  EXPANS  I ONS ', 1 9X , 1 *' , 
+/JOX,'*  AND  QUANTILE  TECHNIQUES' ,44X, 10X, , 

+78X, ,/, 10X.20  (4H****)) 

CALL  DATAIN(NTAPE,X,NX,L,LABX) 

CALL  DATAIN  (NTAPE, Y, NY, L.LABY) 

N-NX 

IF  (NX.EQ.NY)  GO  TO  40 
WRITE  (NOUT, 30)  LABX.LABY 

30  FORMAT  (1H  , 10X.20A4/, 10X, 20A4,//, 10X, 'SAMPLE  SIZES  NOT  EQUAL.', 
+  '  BIVARIATE  ANALYSIS  I NAPROPR I  ATE .  EXECUTION  TERMINATED.') 

STOP 

40  WRITE  (NOUT, 50)  LABX.LABY, N 

50  FORMAT  (1H  ,9X,20A4/, 10X.20A4,//, 10X, 'N-' , 15) 

IF  ((IPLT1 .EQ.l) .OR. (IPLT1 - EQ . 3) ) 

+CALL  PPLOT  (X , Y , 500 , N , 1 , CHAR, C APT, XNAME , YNAME , 0) 

WRITE  (NOUT, 1) 

ORDER  BIVARIATE  DATA  BY  X  VALUES 

CALL  0RD2  (T,N,500) 

IF  (IDST.EQ.O)  GO  TO  58 
NN1-2*N+1 
DO  51  1-1. N 
W(l)-X(l) 

51  CONTINUE 

CALL  DESTAT(W,N,LABX, IPLT3,XMED,SA,GE,NN1) 

DO  52  1-1, N 
W  ( I )  -Y  ( I ) 

52  CONTINUE 
CALL  ORD(W.N) 

CALL  DESTAT (W.N, LABY, I PLT3,YMED,SA,GE,NN1) 

C 

C  TRIM  DATA  SET  OF  AT  MOST  KDEL  EXTREME  POINTS 


o  u 
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C 

58  CALL  TRIM(X,Y,XMED,YMED,KDEL,N,NEWN) 

N-NEWN 

C 

C  OBTAIN  RANKS  OF  X  AND  Y  VALUES 
C 

CALL  RANK (X.N.RANKX) 

CALL  RANK  (Y,N,RANKY) 

C 

C  COMPUTE  CORRELATION  COEFFICIENTS 
C 

CALL  SPRMN (N.RHO.SUMD) 

CALL  KENDAL  (NtTAUA,TAUB,SOMER,NC,ND,NIND,NDEP,NPAIRS) 

CALL  PEARSN(N.R) 

CALL  CMP  I NF (N , MORD , I OQX , I DQY , I PLT2 , HD) 

C 

WRITE  VALUES  OF  CORRELATION  COEFFICIENTS 

IF  ((NINO.EQ.O)  .AND.  (NDEP.EQ.O))  GO  TO  59 
WRITE  (NOUT, 55)  NIND.NDEP 

55  FORMAT  (//,  10X, ‘TIES  IN  X  «',|l»,',  TIES  IN  Y  ,  N* , //) 

59  WRITE (NOUT ,50)  LABX.LABY.N 
WRITE  (NOUT, 60) 

60  FORMAT  (10X,‘  PEARSON  SPEARMAN  KENDALL  A  KENDALL  B‘, 

+  '  SOMER  D  H(D-TIL)  H  (D8)  H  (D2M  H  (D48)  1  , 

+/,  10X,  10  (9H . . . )) 

WRITE  (NOUT, 70)  R,RHO,TAUA,TAUB, SOMER, HD (4) ,HD(1) ,HD(2) ,HD(3) 
70  FORMAT  (10X,9F 10 . 4) 

DO  80  1-1, N 

X  ( I ) -RANKX ( I ) /FLOAT (N+l) 

Y  (l)-RANKY  (I)  /FLOAT  (N+1) 

80  CONT I NUE 
DO  90  1-1,6 
CAPT  (1+6) -CRNK  (I) 

90  CONTINUE 

IF ((IPLT1.EQ.2) .OR.  (IPLT1 . EQ. 3) ) 

+CALL  PPLOT (X , Y , 500 , N , 1 , CHAR , CAPT , XNAME , YNAME.O) 

C 

STOP 

END 

SUBROUTINE  CMP  INF  (N.MORD, IDQX, I DQY, IPLT2.HD) 

C*********************************************************** 

c 

c  SUBROUTINE  TO  COMPUTE  COVARIANCE  MATRIX  OF  COMPLEX 
C  EXPONENTIAL  "SUFFICIENT  STATISTICS"  TO  BE  USED  IN 
C  SEQUENTIAL  REGRESSION  ROUTINE  TO  OBTAIN  "BEST  REGRESSION" 

C  MODELS  FOR  ORDERS  1  THROUGH  M*M.  VARIOUS  ORDER  DETERMINING 
C  CRITERION  ARE  COMPUTED  AND  DISPLAYED  VIA  SUBROUTINE  CPTENT. 

C  THE  BIVARIATE  DENSITY  QUANTILE  IS  FORMED  BY  TAKING  THE  PRODUCT 

C  OF  THE  ESTIMATED  DEPENDENCE  DENSITY  AND  THE  UNIVARIATE 
C  AUTOREGRESSIVE  ESTIMATORS. 

C 
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C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


INPUT:  RANKX.RANKY  -  VECTORS  CONTAINING  RANKS  OF  X  AND  CORANKS 

OF  Y 

X , Y  -  BIVARIATE  DATA 
N  -  SAMPLE  SIZE 

IDQX.IOQY  -  NULL  UNIVARIATE  DENSITY  INDICATORS 
MORD  -  MAXIMUM  ORDER  FOR  AR  DENSITY  EST.  PROCEDURE 
IPLT2  -  PLOTTING  OPTION  FOR  UNIVARIATE  DENSITY  QUANTILES 

OUTPUT:  PHI  -  COVARIANCE  MATRIX 

FQX.FQY  -  UNIVARIATE  DENSITY  QUANTILE  FUNCTIONS 
DQHAT  -  BIVARIATE  DENSITY  QUANTILE  FUNCTION 
HD  -  VECTOR  OF  ENTROPY  ESTIMATORS:  1  -  ORDER  8 

2  -  ORDER  24 

3  -  ORDER  48 

4  -  RAW  (FROM  D-TILDA) 

NOTE:  FQX.FQY  ARE  NOT  PASSED  BACK  TO  THE  CALLING  PROGRAM. 

ALSO,  CRITERION  FUNCTIONS  ARE  PLOTTED  BUT  NOT  PASSED 
BACK  TO  THE  CALLING  PROGRAM. 

SUBPROGRAMS  CALLED:  CSQREG.CPTENT.PLOTXY.FTERP, AUTDEN.RELMIN.MIN 


C************************************************************ 


COMMON  X  (500) , Y  (500) .RANKX (500) .RANKY  (500) 

DIMENSION  I  NO  (97) , RAOSQ (500) , I ORD  (49) , 

+HD (4)  , OT I L  (500) 

DIMENSION  ME NT (3) 

COMPLEX  ARGM(13,13) ,PHI (50,50) ,CEXP,CONJG,CMPLX,ZARG 
COMPLEX  ALPHX  (5)  ,ALPHY(5)  , COF  (97) 

DATA  I ORD/25, 24, 18, 17, 26, 32, 19, 3U33.23. 11. 16. 10, 30, 12, 9, 
+27.39,20,38,34,40.13,37,41,22,4,15.3.29,5.8,2,36,6,1,28. 
+46,21,45,35,47. 14,44,42,48,7.43,49/ 

DATA  MENT/8,24,48/ 

REAL  LGDHAT 
IF  (N.GT.29)  GO  TO  20 
WRITE  (6, 10)  N 

10  FORMAT (10X, 'SAMPLE  SIZE  ',12,'  IS  TOO  SMALL.  CMPINF  SKIPPED.') 
RETURN 
C 

C  SET  VALUES  OF  CONSTANTS 
C 


20  N2-N-2 

00  21  1-1,4 

21  HO  (0-999*0 
C 

C  FOR  THIS  VERSION  USING  COMPLEX  SEQUENTIAL  REGRESSION  THE 
C  MAXIMUM  APPROXIMATING  ORDER  IS  SET  AT  7. 

C 

7 

l -MOD (M, 2) 

ML-  (M-L) /2 
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IF(L.EQ.O)  M-M+l 
M2-2*M- 1 
MM-M*M 
M1-MM+1 
MM1-MM-1 

DENOM-1 . 0/f LOAT  (N+l) 

TW0PI-8.0*ATAN (1 .0) 

PI*TW0PI/2.0 

C 

C  COMPUTE  NEAREST  NEIGHBOR  DENSITY  ESTIMATE  AND  RAW  ESTIMATE 
C  OF  ENTROPY 
C 

HDO-O.O 
DO  30  1-3. N2 
DO  25  J*1 .N 

RADSQ  (J) - (RANKX  ( I ) -RANKX (J) ) **2+  (RANKY  ( I ) -RANKY  (J) ) **2 

25  CONTINUE 

DO  26  K«l,5 

CALL  MIN  (RADSQ, N.RMIN, INDR) 

RADSQ  ( I NDR) -FLOAT (2*N*N) 

26  CONTINUE 

VKJ-RMI N*D£NOM*DENOM*PI 
IF  (VKJ.EQ.O.O)  VKJ-0.5*DENOM*DENOM*PI 
C 

C  DTIL  IS  ALOG(DTIL) 

C 

DTI L (I) ^ALOG (5-0/ (FLOAT (N-M ) *VK J) ) 

HDO-HOO'DT I L (I ) 

30  CONTINUE 

HD  (4) -HDO/FLOAT (N-4) 

C 

C  COMPUTE  MATRIX  OF  EXPONENTIAL  CROSS-PRODUCTS  TO  BE  USEO  FOR 
C  COVARIANCE  COMPUTATIONS 
C 

DO  50  1*1, M2 
I  1-1 -M 

00  50  J*1,M2 
Jl-J-M 

ARGM(I , J)-CMPLX (0.0, 0.0) 

DO  40  K-3.N2 

ARG-TWOP I  * (F  LO AT ( 1 1 ) *RANKX (K) +F  LOAT ( J 1 ) *RANKY (K) ) *DEN0M 
ZARG-CMPLX (0 . , ARG) 

ARGM  ( I , J) -ARGM ( I , J) +CEXP (ZARG) 

40  CONTINUE 

ARGM  ( I , J) -ARGM (I , J) /F LOAT (N-4) 

50  CONTINUE 
C 

C  COMPUTE  COVARIANCE  MATRIX 
C 

DO  60  IN-1, MM 
I -IN-1 
1 2-MOD  (I  ,M) 
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1 1  -  ( I  - 1 2) /M+M-ML 
1 2*1 2+M-ML 
00  55  JN-1.IN 
J-JN-1 
J2-M0D(J,M) 

J1«(J-J2)/M 
I  I -I 1-J1+ML 
JJ-I2-J2+ML 
J 1-J  l+M-ML 
J2-J2+M-ML 

PH  I  ( I N , JN) -ARGM ( II , J J) -ARGM  (11,12) *CON JG  (ARGM  (J 1 , J2) ) 

PHI  (JN, I N) -CONJG  (PHI  (IN.JN)) 

55  CONTINUE 
60  CONTINUE 
C 

C  COMPUTE  LAST  ROW  OF  COVARIANCE  MATRIX 
C 

DBAR-0.0 
DO  70  1-3. N 2 
DBAR-DBAR+DTIL (I) 

70  CONTINUE 

DBAR-DBAR/FLOAT (N-4) 

DO  90  IN-1, MM 

I- IN-1 

I  2-MOO ( I ,M) 

I I-  (I -I  2) /M-Ml 
I2-I2-ML 

PHI  (Ml , I N) -CMPLX  (0.0, 0.0) 

DO  80  K-3.N2 

ARG-TWOP I  *  (F  LO AT  ( 1 1 ) *RANKX  (K) +F LOAT  ( 1 2) *RANKY  (K) ) *DENOM 
ZARG-CMPLX (0.0, ARG) 

PH  I  (Ml , I N) -PH  I  (Ml , I N) +DT I L (K) *CONJG  (CEXP  (ZARG) ) 

80  CONTINUE 

PH  I  (Ml , I N) -PH  I  (Ml , I N) /FLOAT (N-4) -DBAR*CONJG (ARGM ( I 1+M, I 2+M) ) 
PHI  (IN, Ml) -CONJG  (PHI  (Ml, IN)) 

90  CONTINUE 

PHI  (Ml, Ml) -0.0 
DO  100  K-3.N2 

PHI  (Ml ,M1) -PH I  (Ml ,M1)  +DTI L (K) *DTI L  (K) 

100  CONTINUE 

PHI  (Ml.Ml)-PHI  (Ml , Ml) /FLOAT (N-4) -DBAR*DBAR 
C 

C  CALL  ROUTINE  CPTENT  TO  COMPUTE  AND  PLOT  CRITERION  FUNCTIONS  AND 
C  DETERMINE  BEST  AND  SECOND  BEST  MODELS  FOR  D(U1,U2) 

C 

CALL  CPTENT (RANKX, RANK Y,N,M, PH  I , IORD, IND,COF,MENT,HD) 

C 

C  COMPUTE  UNIVARIATE  DENSITY  ESTIMATES  USING  AUTOREGRESSIVE 
C  TECHNIQUE 
C 

WRITE  (6,134) 

134  FORMAT (IH1) 


o  o  o  o 
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CALL  AUTDEN  (X , N , I OQX , I PLT2 , MORD , ALPHX, RVARX , S I GX, NVX ,0 , 4H  X  ) 
WRITE  (6,134) 

CALL  AUTDEN (Y,N, I DQY, I PLT2 ,M0RD, ALPHY,RVARY , S I GY.NVY, 1 ,4H  Y  ) 
WRITE  (6,134) 

WRITE  (6, 135)  NVX.NVY 

135  FORMAT  (/,  10X,  'UNIVARIATE  BEST  ORDERS:  NVX  -M3.'.  NVY  -M3) 
DO  260  ITfcR-1 ,3 

PS  I -0.0 
ENT-0.0 
WRITE  (6,136) 

136  FORMAT  (/,19X.,Ur,17X,,U2M5X,'DQHT',15X,’DHAT,f/,2X, 

+  19  (4H - )) 

DO  220  1-1,40 
DO  220  J-1,40 
U1-FL0AT  ( I ) /4 1 . 

U2-FL0AT  (J)/41 . 

C 

C  COMPUTE  VALUES  OF  UNIVARIATE  DENSITY-QUANTILE  FUNCTIONS 
C 

FQX-1 .0 

IF  (NVX.GT.O)  FQX-AREST(U1,NVX, RVARX, ALPHX) 

FQX-FQFNC (U 1 , I DQX) /  (FQX*S I GX) 

FQY-1  .0 

IF  (NVY.GT.O)  FQY-AREST (U2,NVY,RVARY, ALPHY) 

FQY-FQFNC (U2, I DQY)  / (FQY*SIGY) 

COMPUTE  BIVARIATE  DENSITY  QUANTILE  BY  FORMING  PRODUCT 
OF  DEPENDENCE  DENSITY  AND  AUTOREGRESSIVE  ESTIMATORS 

LGDHAT-0.0 
KP-MENT (ITER) 

LOC-1 

IF (ITER. EQ. 2)  LOC-MENT (1)+1 
IF  (ITER. EQ. 3)  LOC-MENT (D+MENT  (2)  +  l 
00  200  K-l ,KP 
I  l-IND  (LOC)-l 
1 2-MOD  (I  I  , M) 

1 1- (1 1-12)  /M-ML 
1 2“ 1 2— ML 

ARG-TWOP I  * (F  LOAT ( 1 1 )  *U 1+F  LOAT  ( 1 2) *U2) 

ZARG-CMPLX (0.0, ARG) 

LGDHAT-LGDHAT+REAL (COF (LOC) *CEXP  (ZARG) ) 

LOC-LOC+1 
200  CONTINUE 

IF  (LGDHAT.GT.170.)  RETURN 
IF  (LGDHAT.LT.-20.)  LGOHAT— 20. 

DHAT-EXP  (LGOHAT) 

ENT -ENT- LGDHAT*OHAT 
PSI-PSl+DHAT 
OQHT-DH AT *F  QX  *F  QY 
WRITE  (ITER, 210)  U1 ,U2,0QHT,DHAT 
210  FORMAT  (2X,4F 19. 10) 
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I  MOD-MOD (1,8) 

JMOD-MOD (J , 8) 

!  F  ( (IMOD.EQ.O) .AND. (JMOD.EQ.O))  WRITE  (6,210)  U1 ,U2,DQHT,DHAT 
C 

220  CONTINUE 

PSI-PSI/1681.0 
ENT-ENT/1681 .0 
HD  (ITER) -ENT/PS  I +ALOG (PS  I ) 

WRITE  (6,224) 

224  FORMAT (//,  10X, 20 (4H-—)  ,//) 

WRITE  (6,225)  MENT(ITER)  ,PSI 

225  FORMAT  (/,  1 0X,  1  INTEGRATING  FACTOR  FOR  ORDER  M3.'  IS  '  ,  F  TO  .4) 
WRITE  (6,230) 

230  FORMAT  (//, 10X, 'COEFFICIENTS  FOR  BIVARIATE  DEPENDENCE  DENSITY', 
+//. 12X, 'NU1 ' , 2X, 1 NU2 ' ,2X, '  REAL  (COF)  IMAG(COF)  ' ,/, 10X.32  (1H-) ) 
LOC-1 

IF  (ITER. EQ. 2)  LOC-MENT  (1)  +  1 
IF  (ITER. EQ. 3)  LCC-MENT(1)+MENT(2)  +  1 
DO  250  1-1, KP 
I  I - 1 ND  (LOC) - 1 
I  2-MOD  (I  I  ,M) 

I  1«(l  1-12)  /M-ML 
I2-I2-ML 

WRITE  (6,240)  1 1 , 12, COF  (LOC) 

240  FORMAT (10X, 21 5. 2F 10.4) 

LOC-LOC+1 
250  CONTINUE 

WRITE  (6,224) 

260  CONTINUE 
RETURN 
END 

SUBROUTINE  CPTENT  (RANKX,RANKY,N,M,PHI , IORD, I ND , COF ,MENT,HD) 

C *************************************************************** 

c 

C  SUBPROGRAM  TO  COMPUTE  AND  PLOT  ENTROPY  OF  D-HAT. 

C  THIS  SUBROUTINE  WILL  ALSO  COMPUTE  THE  CRITERION  FUNCTION 
C  AIC  AND  PRINT  THE  SMALLEST  TWO  RELATIVE  MINIMA. 

C  COEFFICIENTS  FOR  THE  THREE  ORDERS  SPECIFIED  IN  MENT 

C  WILL  BE  RETURNED  IN  COF  WITH  THE  CORRESPONDING  INDICES 

C  IN  IND. 

C 

C  INPUT:  N,M  -  SAMPLE  SIZE,  UNIVARIATE  MAXIMUM  ORDER  (M**2 
C  USED  FOR  BIVARIATE  MAX  ORDER) 

C  RANKX.RANKY  -  VECTORS  OF  RANKS  AND  CO-RANKS 

C  PHI  -  COVARIANCE  MATRIX 

C  IORD  -  VECTOR  OF  ORDERED  INDICES  FOR  SEQUENTIAL  REGRESSION 

C 

C  AUXILIARY:  NVAR.RVAR.BEST  -  VECTORS  AND  MATRIX 
C  FROM  ROUTINE  CSQREG 

C 

C  OUTPUT:  COF, IND  -  SEE  ABOVE 
C 


n  n  o  n  or 1  o  o  o  o 
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C  SUBROUTINES  CALLED:  PLOTXY, FTERP, Ml NMAX.RELM IN, MIN, CSQREG, CSWEEP 
C 

C******************************************************************* 

c 

DIMENSION  RANKX  (N) , RANKY (N) , A I C (50) . 

+MENT  (3)  ,  I ND  (97)  ,HD(4) 

DIMENSION  I ORD (49) . NVAR  (49) . I NDV  ( 1 225) . RVAR (49) 

COMPLEX  PHI  (50,50) ,COF  (97) .BEST  (1225) 

REAL  LGDHAT 
MM«M*M 
MM I -MM- 1 
L-MOD (M, 2) 

ML-  (M-L)/2 
TW0PI-8.0*ATAN(1.0) 

CALL  ROUTINE  CSQREG  TO  PERFORM  SEQUENTIAL  REGRESSION  ON  PHI 

CALL  CSQREG (PH  I , 50 . MM, I ORD , BEST , I NDV , 1 225 . RVAR , NVAR , N I VN) 

CALL  CLPLT1 (RVAR , N I VN , 1 , 4HRVAR ,41,1) 

NV1-NIVN+1 

COMPUTE  AIC  CRITERION  FUNCTION 

AIC  0  —  1. /FLOAT  (N) 

A I C (2) -ALOG (RVAR ( 1 ) ) +2 .0/F LOAT  (N) 

00  30  I-2.NIVN 

A I C  ( I  + 1 ) -ALOG (RVAR ( I ) ) +2 . *F LOAT ( I ) /FLOAT (N) 

30  CONTINUE 

DEC-1. 0/FLOAT  (N) 

WRITE  (6,40) 

40  FORMAT (/,! OX, 'OUTPUT  FROM  RELMIN  FOR  ORDER  DETERMINING 
+' CRITERION.  (SUBTRACT  ONE  FOR  TRUE  ORDER)',/) 

CALL  RELMIN(AIC,NV1, DEC, MINI, MIN2,4H  AIC) 

MINI-MINI-1 
MIN2-MIN2-1 
WRITE  (6,50)  MINI ,MIN2 

50  FORMAT (/, 10X, 'BEST  ORDER  BY  AIC  -' , 1 3,/, 10X, ' 2ND  BEST  ORDER  ', 
+'BY  AIC  -',13./) 

COMPUTE  ENTROPY  MEASURE  FOR  EACH  ORDER 


LOC-l 

00  180  1-1,3 
K-MENT  (I) 

IF  (K.EQ.O)  GO  TO  l80 
Kl-NVAR (K) 

DO  170  KK-l.K 
I  NO  (L0C)"INDV(K1) 

COF (LOC) -BEST (K I) 

K1-K1+1 

LOC-LOC+1 


246 


j 

J 

170  CONTINUE 

180  CONTINUE  I 

RETURN  I 

END 

I 

J 

J 

J 

J 

J 

J 

J 

] 

I 

I 

I 

I 

I 

I 

I 


o  o  o  o 
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APPENDIX  C 


C.l  Subprogram  CSQREG 


SUBROUTINE  CSQREG (A.NDIM.NI V, I0RD.BEST, INDV, MDIM, RVAR, NVAR.NIVN) 
C ***************************************************************** 


SUBPROGRAM  TO  PERFORM  SEQUENTIAL  REGRESSION  USING  COVARIANCE 
OR  CORRELATION  MATRIX  A  (NIV+1 .NIV+l) . 


C  INPUT:  A  -  COVARIANCE  MATRIX  (COMPLEX) 

C  NDIM  -  ROW  DIMENSION  OF  A  IN  CALLING  PROGRAM 

C  NIV  -  NUMBER  OF  INDEPENDENT  VARIABLES 

C  I ORD  -  INTEGER  VECTOR  CONTAINING  INDICES  OF  VARIABLES 

C  IN  THE  ORDER  THEY  ARE  TO  BE  ENTERED  INTO  THE  MODEL 

C  MDIM  -  DIMENSION  OF  BEST  IN  CALLING  PROGRAM 

C 

C  OUTPUT:  A  -  SWEPT  COVARIANCE  MATRIX 
C  BEST, I NDV  -  VECTORS  OF  SUBSET  INFORMATION 

C  BEST  CONTAINS  LEAST  SQUARES  PARAMETER  ESTIMATES 

C  I  NOV  CONTAINS  VARIABLE  INDICES 

C  RVAR  -  VECTOR  OF  RESIDUAL  VARIANCES 

C  I ORD  -  VECTOR  CONTAINING  INDICES  OF  VARIABLES  IN  ORDER 

C  THAT  THEY  WERE  ENTERED  WITH  VALUES  CAUSING 

C  SINGULARITIES  IN  A  OMITTED 

C  NIVN  -  NUMBER  OF  INDEPENDENT  VARIABLES  INCLUDED  IN 

C  ANALYSIS 

C 

C  SUBPROGRAMS  CALLED:  CSWEEP 
C 

C***************************************************************** 

C 

COMPLEX  A(NDIM.NDIM) .BEST (MDIM) 

DIMENSION  I  NDV  (MDIM)  ,  I  ORD  (NIV)  ,  RVAR  (NIV)  .NVAR(NIV) 

DATA  T0L/1.E-20/ 

NV-NIV+1 

NIVN-NIV 

VAR-REAL (A (NV.NV)) 

LOC-1 

LC2-1 

KOUNT-1 

K-l 

10  ID-tORO(K) 

KOUNT-KOUNT+1 

TEST-REAL (A (ID, I D) ) **2+AIMAG (A  (ID, ID) ) **2 
I F  (TEST.LE.TOL)  GO  TO  40 
CALL  CSWEEP (A, NDIM, NV, ID, ID) 

RVAR  (K) -REAL (A (NV.NV) )  /VAR 


o  oo  o  o  o 
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DO  30  JCK-l.K 
KID- I ORD  (KK) 

IF  (KK.NE.l)  GO  TO  20 
NVAR (LC2)  -LOC 
LC2-LC2+1 
20  I NDV  (LOC) -K I D 

BEST  (LOC)  —  A(NV,KID) 
LOOLOC+1 
30  CONTINUE 
GO  TO  60 
40  N I VN-N I VN- 1 
DO  50  l-K.NIVN 
I ORD ( I ) -I ORD (1+1) 

50  CONTINUE 
GO  TO  10 
60  K-K+l 

IF (KOUNT.LE.NIV)  GO  TO  10 

RETURN 

END 


C.2  Subprogram  CSWEEP 


SUBROUT  I NE  CSWEEP  (A , NO  I M, N , K 1 , K2) 
C********************************************************** 

C 

C  SUBROUTINE  TO  SWEEP  THE  NXN  COMPLEX  MATRIX  A  ON  ITS  K1 
C  THRU  K2  DIAGONAL  ELEMENTS  (SWP  (K)  SWP  (K)  A-A) 

C 

C  INPUT  : 

C  A  N  K1  K2 

C  NOIM  :  ROW  DIMENSION  OF  A  IN  CALLING  PROGRAM 

C 

C  OUTPUT  : 

C  A 

C 

C  SUBROUTINES  CALLED  :  NONE 
C 

C *********************************************************** 

C 

COMPLEX  D,A (NDIM.NDIM) 

OATA  NOUT/6/ 

FIX  DIAGONAL  K  : 

DO  50  K-K1.K2 

CHECK  FOR  ZERO  : 


w-4-  ~r. 


.  t  ' 


<">  o  r>  o  o  r> 


TE  ST-REAL  (A (K , K) )  **2+A I  MAG (A (K ,  K) )  **2 
IF  (TEST.LT.1 .E-25)  GO  TO  99 
0-1  ./A(K.K) 

A (K,K) -1 . 

C 

C  KTH  ROW  : 

C 

DO  10  1-1, N 
10  A  (K, I) -D*A  (K, I) 

KTH  COLUMN  : 

DO  20  J-1.N 
IF  (J.EQ.K)  GO  TO  20 
A  (J , K)  —  A  (J.K)  *D 
20  CONTINUE 

OTHERS  : 

DO  40  J-l.N 
IF  (J.EQ.K)  GO  TO  40 
DO  30  1-1, N 
IF (I .EQ.K)  GO  TO  30 
A  (J ,  I )  — A  (J ,  I )  +A  (J.K)  *A  (K,l)/D 
30  CONTINUE 

40  CONTINUE 

C 
C 

50  CONTINUE 
C 
C 

GO  TO  110 

99  WRITE (NOUT, 100)  K.K1.K2 

100  FORMAT (10X, 12, 15HTH  OIAG  OF  FROM, IX, 1 2, IX, 2HT0, IX, 
1I2.1X.17HIS  ZERO  IN  CSWEEP) 

110  RETURN 
END 


C.3  Subprogram  AUTDEN 


SUBROUT I N E  AUTO EN  (W , N , 1 OQH , I  PIT 2 , MORD , AL PH , RVARW , S I  GO , NVW , 

+ 1  SORT ,WLAB) 

C*************************************************************** 

C 

C  THIS  SUBPROGRAM  COMPUTES  A  SMOOTHED  DENSITY  QUANTILE 
C  FUNCTION  BASED  ON  THE  AUTOREGRESSIVE  METHOD  OF  PARZEN (1979) . 

C  THIS  ROUTINE  IS  BASED  ONE  THE  ONESAM  PROGRAM  DENSITY  ESTIMATION 
C  ROUTINE  ANO  USES  MANY  OF  THE  SUBPROGRAMS  OF  ONESAM.  SEE 
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C  PARZEN  AND  ANDERSON  O98O)  FOR  DOCUMENTATION. 

C 

C  INPUT:  W  -  RAW  DATA 

C  N  -  SAMPLE  SIZE 

C  IDQH  -  INDICATOR  FOR  NULL  DIST.  OF  W 

C  I  SORT  -  0  IF  W  AND  RANKW  SORTED,  1  OTHERWISE. 

C  MORO  -  MAXIMUM  ALLOWABLE  ORDER  (<-6) 

C  IPLT2  -  0 — >  NO  PLOTS 

C  1— >  PLOT  OF  AR  DENSITY-QUANTILE  FUNCTION 

C  WLAB  -  VARIABLE  NAME  FOR  W  IN  A4  FORMAT 

C 

C  OUTPUT:  NVW  -  ORDER  OF  AUTOREGRESSIVE  DENSITY  ESTIMATOR 
C  ALPH  -  COEFFICIENTS  FOR  AUTOREGRESSIVE  REPRESENTATION 

C  RVARW  -  RESIDUAL  VARIANCE  FOR  BEST  ORDER 

C  SIGO  -  INTEGRATING  FACTOR  (SIGMA-TILDA  FOR  NULL  MODEL) 

C 

C  SUBPROGRAMS  CALL'tD:  ORD , QHL I N, QTOFQ, WSPACE , FOR  I ER, AUTORG.PARZ, 

C  AREST , FQFNC , MDNR I S , QF I ND , PLOTXY , FTERP.MI NMAX , 

C  MIN, MAX 

C 

C *************************************************************** 

DIMENSION  W (N)  ,RVAR(5)  ,U(500)  ,QN(500)  ,QL  (500)  ,FQ(500)  , 

+WXS  (500)  ,  CWXS  (500)  ,  I  LOC  (5)  .  T  (500)  ,  CAT  (5)  ,  WK 1  (500)  ,  WK2  (500) 
DIMENSION  CAPT  (20) 

COMPLEX  A  (5) , PHI  (5) , ALPH  (5) .ALPHA  (15) .RESVAR 
DATA  CAPT/4HUNIV.4HARIA.4HTE  D.4HENSI .AHTY-Q.4HUANT.AHI LE  , 
+4HF0R  .4HRAND.4H0M  V, 4HAR I  A, 4HBLE  ,8*4H  / 

CAPT  (13) -WLAB 
WRITE  (6.1)  WL’.B 

1  FORMAT  (//, 1 OX, 'UNIVARIATE  DENSITY  ESTIMATION  RESULTS  FOR  ', 
+'VARI ABLE  1 , A4,//) 

DO  5  1-1, N 
T(l)-W(l) 

5  CONTINUE 
N2-N+2 
M-MORD+1 
IF (M . GT . 6)  M-6 
MM1-M-1 

H-l ./FLOAT  (N+l) 

IF  (H.LT.0.02)  H-0.02 
IF (ISORT.EQ.O)  GO  TO  10 
CALL  ORD(T.N) 

10  CONTINUE 
TMIN-T(l) 

C 

C  COMPUTE  N  EQUALLY  SPACED  U  VALUES  BETWEEN  0  AND  1 
C 

U(1)-0.0 
DO  30  J-l.N 
U (J+1) -FLOAT (J) *H 
30  CONTINUE 
C 


XL 


non 
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C  COMPUTE  QUANTILE  FUNCTION  VIA  LINEAR  INTERPOLATION 
C 

CALL  QHL I N  (T, N,H, U, QN , 0, NQ.TMI N ,T I NT,N2 , WK1 , WK2) 

C 

C  COMPUTE  LITTLE  Q  AND  FQ-1/ (LITTLE  Q) 

C 

NP1-NQ+1 

CALL  QTOFQ  (QN,U,NP1 ,QL,FQ) 

C 

C  COMPUTE  WEIGHTED  SPACINGS  (LITTLE  D  (U) )  BASED  ON  I DQH  DIST. 
C 

NP1-NQ+1 

CALL  WSPACE (WXS , CWXS , NP 1 , FQ, IDQH.U.SIGO) 

C 

C  COMPUTE  FOURIER  TRANSFORM  OF  WEIGHTED  SPACINGS 
C 

CALL  FOR  I ER  (WXS,U (2) , N, A,M) 

C 

C  COMPUTE  AUTOREGRESSIVE  COEFFICIENTS  FOR  ORDERS  1  TO  M 
C 

I  1-1 

DO  100  K-1.MM1 
KP1-K+1 

CALL  AUTORG (A,KP1 ,M, ALPH, PHI ,RESVAR) 

RVAR  (K) -REAL (RESVAR) 

ILOC (K) -I  I 
DO  90  J-l.K 
ALPHA  (I  D-ALPH  (J) 

I l-l 1+1 
90  CONTINUE 
100  CONTINUE 

CALL  PARZ (RVAR , M- 1 , N , C AT , NVW) 

IF(NVW.EQ.O)  GO  TO  115 
LOC-ILOC (NVW) 

DO  110  1-1, NVW 
ALPH ( I ) -ALPHA  (LOC) 

LOC-LOC+1 
110  CONTINUE 

115  CALL  CLPLT1 (RVAR.M-l , 1 ,AHRVAR,4l , 1) 

COMPUTE  UNIVARIATE  DENSITY-QUANTILE  AT  100  POINTS  AND  PLOT 

WRITE  (6, 120)  SIGO  ’ 

120  FORMAT {/.lOX.'SIGO  -  F10.lt) 

RVARW-RVAR  (NVW) 

DO  160  1-1,100 
U  (I) -FLOAT  (I) /101 .0 
FI-1.0 

I F  (NVW.GT.O)  F l-AREST (U (I) , NVW, RVARW, ALPH) 

I F  (F I . EQ.O.)  Fl-H 
FQ(I)-FQFNC  (U(l)  ,IOQH)/(FI*SIGO) 

160  CONTINUE 


o  o  n  o 
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IF  (I PLT2.EQ. 1) 

+CALL  PLOTXY  (U, FQ, 100, CAPT, 4H  U.AH  FQ,1) 
RETURN 
END 


C.4  Subprogram  LEGP 


SUBROUTINE  LEGP  (N,P, IDEG) 

C 

C *********************************************************** 

c 

C  SUBROUTINE  TO  GENERATE  3XN  MATRIX  P  OF  COEFFICIENTS 
C  OF  LEGENDRE  POLYNOMIALS.  ROW  3  CONTAINS  COEFFICIENTS 
C  FOR  THE  LEGENDRE  POLYNOMIAL  (OVER  (-1,1))  OF 
C  DEGREE  IDEG  (DETERMINED  BY  SUBROUTINE  AFTER  INITIAL  CALL) 

C 

C  INPUT:  N  -  SAMPLE  SIZE  (OR  HIGHEST  ORDER  DESIRED) 

C  IDEG  -  0  — >  FOR  FIRST  CALL 

C  ORDER  OF  POLYNOMIAL  IN  THIRD  ROW  FOR  SUBSEQUENT 

C  CALLS  (PROVIDED  BY  ROUTINE) 

OUTPUT:  P  -  THE  3XN  MATRIX  OF  LEGENDRE  POLYNOMIAL 
COEFFICIENTS. 

C  ALGORITHM:  THE  SECOND  ORDER  RECURSION  RELATION  COMMONLY 
C  FOUND  IN  MOST  TEXTBOOKS  (SEE,  E.G.,  CHURCHILL, 

C  "SPECIAL  FUNCTIONS") 

C 

C************************************************************ 

C 

DIMENSION  P(3.N) 

IF  (IDEG.NE.O)  GO  TO  30 
DO  20  1-1,3 
DO  10  J-l.N 
P(l  ,J)-0.0 
10  CONTINUE 
20  CONTINUE 
P(l,l)-1.0 
P (2,2) -1  .0 
IDEG-2 
GO  TO  70 
30  I DEG-I DEG+1 
DO  60  J-l , I  DEG 
P(1,J)-P  (2,  J) 

P  (2,  J)  -P  (3.J) 

60  CONTINUE 
70  CONTINUE 

A- (2.O*FL0AT (IDEG) -1 .0) /FLOAT  (IDEG) 
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B-FLOAT  ( I  DEG- 1) /FLOAT  ( I  DEG) 

I l-IOEG+1 

DO  100  J-l, I  I 

IF  (J.GT.l)  GO  TO  80 

P(3.1)—  B*P(1,D 

GO  TO  100 

80  P(3,J)«A*P(2,J-1)-B*P(1,J) 

100  CONTINUE 
RETURN 
END 


o  o  o  r»  o  o  o 
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APPENDIX  D 


D.1  Subprogram  PLOTXY 


SUBROUT  I NE  PLOTXY (X , Y , N . CAPT , NAMX , NAMY , I  OPT) 
C*********************************************************** 

c 

C  SUBROUTINE  TO  PRINT  AND  PRINTER  PLOT  THE  N-VECTOR  Y  AS  A 
C  FUNCTION  OF  X. 

C 

C  INPUT  :  N,X, Y  -  X  IS  ORDERED  ON  INPUT  AND  Y  ( I ) -Y  (X  ( I ) ) 

C  CAPT  -  LITERAL  CONSTANT  FOR  TITLE  OF  PLOT  IN  20A4  FORMAT 

C  NAMX, NAMY  :  k  CHARACTER  LITERAL  CONSTANTS  GIVING 

C  LABELS  FOR  X  AND  Y 

C  I  OPT  :  1,2  (POINT  OR  BAR  PLOT) 

C 

C  SUBROUTINES  CALLED  :  FTERP.MAX.MIN 
C 

C *********************************************************** 

c 

DIMENSION  X  (N)  ,Y(N)  ,T{46)  ,YI  (1*6)  , CAPT  (20)  , AL  (101) 

DATA  NOUT/6/ 

DATA  BLANK, D0T.2, SL.PLUS/1H  , 1H. , 1H*. 1H| , 1H+/ 

C 

MM-81 
I OPTY-O 

IF  (N.GT.  If))  GO  TO  II 
WRITE (NOUT, 10)  N 

10  FORMAT (10X, 'SAMPLE  SIZE  OF  ',12,'  IS  TOO  SMALL  TO  PERFORM 
+' INTERPOLATION  IN  PLOTXY.') 

GO  TO  100 

11  CONTINUE 
WRITE  (NOUT, 13)  CAPT 

13  FORMAT  (1H1.33X.20A1*,/) 

CREATE  T  VECTOR  OF  EQUALLY  SPACED  X  AND  INTERPOLATE  TO  OBTAIN 
CORRESPONDING  Y  VALUES 

DEC-  (X  (N)-X(1))A5.0 
DO  15  1-1, 46 

T  (I)  «X  (U+FLOAT  (1-1)  *DEC 
15  CONTINUE 

CALL  FTERP  (X,Y,T,YI  ,N,i*6) 

INITIALIZE  AL  : 


ON-  (MM-D/2 
DO  20  J-l.MM 


255 


WRITE  (NOUT, 25)  NAMX.NAMY,  (AL  (J) ,  J-l ,MM) 

25  FORMAT  (/ ,  l6X,Al*,6X,Al*/10X,20  (1H-)  .2X.101A1) 
DO  30  J=1,MM 
30  AL(J)  -BLANK 
AL (1) -SL 
AL  (MM)  -SL 
C 

C  FIND  MAX  AND  MIN  : 

C 

CALL  MAX (Y I .46.YMAX, IND) 

CALL  Ml N  (Y I  ,1*6,  >MIN,  IND) 

RY-1 . 2* (YMAX-YMI N) 

IF(RY.LT.l.E-20)  IOPTY-1 
C 

C  PLOT  : 

C 

00  1*0  J-1,46 
IF(fOPTY.EQ.l)  GO  TO  36 
Cl- (Y  I  (J)-YMIN)  /RY 
Cl-2 -*  (Cl- .5) 

GO  TO  37 

36  Cl-O. 

37  K-ON*(CI+1.)+2-5 
AL  (K)  -2 

IF  (I0PT.EQ.1)  GO  TO  35 
DO  39  1-1. K 
39  AL (I) -Z 
35  CONTINUE 

WRITE  (NOUT, 3t;  T(J)  ,YI  (J)  ,  (AL  (I)  ,  1-1  ,MM) 

38  FORMAT  (10X.F  10.1*,  IX, F9-4.2X,  101  Al) 

AL  (K) -BLANK 

IF  (lOPT.EQ.l)  GO  TO  1*0 
DO  1*1  1*2, K 
1*1  AL  (I ) -BLANK 
1*0  CONTINUE 

00  50  1-1, MM 
50  AL  (I) -DOT 
AL  (1) -PLUS 
AL  (MM) -PLUS 

WRITE  (NOUT, 60)  (AL ( I ) , 1-1 ,MM) 

60  FORMAT  (10X, 20  (1H-) , 2X, 101A1) 

C 

C 

YMAX-RY+YMIN 

WRITE  (NOUT, 70)  YMIN.YMAX 
70  FORMAT  (27X,F10.1*,70X,F10.1*) 

100  CONTINUE 
RETURN 
END 


i 


o  o  r»  o 
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D.2  Subprogram  PLTXYZ 


SUBROUT ! NE  PLTXYZ (X , Y , Z , N , CAPT, NAMX , NAMY , NAMZ , X I SE , XMAXD , XMSE) 

C *********************************************************** 

c 

C  SUBROUTINE  TO  PRINT  AND  PRINTER  PLOT  THE  N-VECTORS  Y  AND  Z  AS 
C  A  FUNCTION  OF  X  ON  THE  SAME  COORDINATE  SYSTEM. 

C 

C  INPUT  :  N,X,Y,Z  -  X  IS  ORDERED  ON  INPUT  AND  Y  (I) -Y  (X  (I) )  ,  ETC. 

C  CAPT  -  LITERAL  CONSTANT  FOR  TITLE  OF  PLOT  IN  20A4  FORMAT 

C  NAMX, NAMY, NAMZ  :  4  CHARACTER  LITERAL  CONSTANTS  GIVING 

C  LABELS  FOR  X.  Y,  AND  Z 

C  X I SE , XMAXD , XMSE  -  SQUARED  ERROR  DIAGNOSTICS 

C 

C  SUBROUTINES  CALLED  :  FTERP.MAX.MIN 
C 

C  *********************************************************** 

c 

DIMENSION  X  (N)  ,Y(N)  ,Z(N)  ,T(46)  ,YI  (46)  ,ZI  (46)  , CAPT (20)  , AL  (101) 
DATA  NOUT/6/ 

DATA  BLANK, DOT, STAR, SL.PLUS/1H  , 1H. , 1H*. 1H| , 1H+/ 

DATA  SM.S0/1HM, 1H0/ 

C 

MM-81 
IOPTY-0 

I F  (N . GT . 1 9)  GO  TO  11 
WRITE  (NOUT, 10)  N 

10  FORMAT (1 OX, 'SAMPLE  SIZE  OF  ',12,'  IS  TOO  SMALL  TO  PERFORM  ', 

+' INTERPOLATION  IN  PLTXYZ.') 

GO  TO  100 

11  CONTINUE 

WRITE (6,13)  XI SE, XMAXD, XMSE 

13  F0RMAT(lHl,9X,85(lH-),/10X,'|  INTEGRATED  SQUARE  ERROR  El 0.4, 

+5X, 'MAXIMUM  ABSOLUTE  DIFFERENCE  •\E10. 4,'  |',/,10X,'|  MEAN', 

+  ' SQUARE  ERROR  , E10.4.52X, ' | 10X.85 (1H-) ) 

CREATE  T  VECTOR  OF  EQUALLY  SPACED  X  AND  INTERPOLATE  TO  OBTAIN 
CORRESPONDING  Y  AND  Z  VALUES 

DEC-  (X  (N)-X(l))/45.0 
DO  15  1-1.46 

T(I)-X(1)+FLOAT(I-1)*OEC 
15  CONTINUE 

CALL  FTERP (X , Y , T, Y I , N , 46) 

CALL  FTERP(X,Z,T,ZI,N,46) 

C 

C  INITIALIZE  AL  : 

C 

ON-  (MM-D/2 
DO  20  J-l.MM 


non  o  o  o 
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20  AL  (J) -DOT 

WRITE  (NOUT, 25)  NAMX.NAMY.NAMZ,  (AL  (J) , J-t ,MM) 

25  FORMAT  (/,  16X,A4,4X,2H*-,A4,4X,2HO-,A4/IOX,30(1H-)  ,2X,  101AJ) 
DO  30  J-l.MM 
30  AL  (J) “BLANK 
AL  (l)-SL 
AL  (MM) -SL 

FIND  MAX  AND  MIN  : 

CALL  MAX(YI ,46,YMAX, IND) 

CALL  MIN(YI .46.YMIN, IND) 

CALL  MAX(ZI,46,ZMAX,IND) 

CALL  Ml N (Z I .46.ZMIN, IND) 

IF  (ZMIN.LT.YMIN)  YMIN-ZMIN 
IF  (ZMAX.GT.YMAX)  YMAX-ZMAX 
RY-1 . 2*  (YMAX-YMI N) 

IF  (RY.LT.l.E-20)  I0PTY-1 


PLOT  s 

DO  40  J-1,46 
IF  (I0PTY.EQ.1)  GO  TO  36 
Cl- (Y I  (J)  -YMIN)  /RY 
C1-2.*  (C1-.5) 

C2-(ZI  (J)  -YMIN)  /RY 
C2«2.*(C2-.5) 

GO  TO  37 

36  Cl-O. 

C2-0. 

37  KY-ON*(Cl+l.)+2.5 
KZ-ON*  (C2+l.)+2.5 
AL (KY) -STAR 

AL (KZ) -SO 

I F  (KY.EQ.KZ)  AL  (KY) -SM 

WRITE (NOUT, 38)  T (J) ,YI  (J) ,ZI  (J) , (AL (I) , 1-1 ,MM) 

38  FORMAT  (10X,F10.4,lX,F9.4,lX,F9«it,2X,101A1) 
AL(KY) -BLANK 

AL  (KZ) -BLANK 
40  CONTINUE 

DO  50  1-1, MM 
50  AL  (l)-DOT 
AL  (l)-PLUS 
AL  (MM) -PLUS 

WRITE  (NOUT, 60)  (AL  (I ) . 1-1 ,MM) 

60  FORMAT (10X.30 (1H-) , 2X, 10IA1) 


YMAX-RY+YMIN 

WRITE (NOUT, 70)  YMIN.YMAX 
70  FORMAT (37X,F10.4,70X,F10.4) 
WRITE  (NOUT, 80)  CAPT 


C 

C 
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80  FORMAT (/, 1 OX, 20A4,/) 
100  CONTINUE 
RETURN 
END 


D.3  Subprogram  FTERP 


SUBROUTINE  FTERP  (U,V,X,F,N,M) 

C**********************^*********************************** 

c 

C  SUBROUTINE  TO  PERFORM  LINEAR  INTERPOLATION  ON  V  TO 
C  OBTAIN  F  AT  THE  M  X  VALUES 
C 

C  INPUT:  U  -  VECTOR  OF  VALUES  AT  WHICH  V  EVALUATED 

C  V  -  FUNCTION  VALUES  TO  INTERPOLATE 

C  X  -  VALUES  AT  WHICH  INTERPOLATED  FUNCTION  TO  BE 

C  EVALUATED 

C  N  -  DIMENSION  OF  VECTORS  U  AND  V 

C  M  -  DIMENSION  OF  VECTORS  X  AND  F 

C 

C  NOTE:  ALL  ABSCISSA  VECTORS  MUST  BE  ORDERED 
C 

C  OUTPUT:  F  -  INTERPOLATED  FUNCTION  VALUES 
C 

C ********************************************************** 
DIMENSION  U (N)  ,V(N)  ,X(M)  ,F  (M) 

IF (N.EQ.M)  GO  TO  100 
1 1-1 

DO  60  1-1 ,M 

10  IF  (X(l)-U(l  0)20,40,50 
20  IF  (I  I .NE.l)  GO  TO  30 

F(I)-V(1)  +  (V(2)  -V(1))*(X(I)-U(1))/(U(2)-U(U) 

GO  TO  60 

30  F  (l)-V(l  l-l)  +  (V(l  I )  —V ( I  l-l))*(X(l)-U(l  I  —  1 ) ) / (U ( I  I )  — U ( I  1-1)) 
GO  TO  60 
40  F(l)-V(ll) 

GO  TO  60 
50  ll-ll+l 

IF  (I  I .LT.N)  GO  TO  10 
I  l-N 

GO  TO  30 
60  CONTINUE 
100  RETURN 
END 


D.A  Subprogram  SEDIAG 


SUBROUT I NE  SED I  AG  (F , G , N , RANGE , X I S£ , XMAXD , XMSE) 

C*************************************************************** 

c 

C  SUBROUTINE  TO  COMPUTE  VARIOUS  SQUARED  ERROR  DIAGNOSTICS 
C  BY  'RAW  NUMERICAL  INTEGRATION  (R I EMANN  SUMS) . 

C 

C  INPUT:  F,G  -  VECTORS  CONTAINING  FUNCTION  VALUES  CORRESPONDING 
C  TO  THE  SAME  ARGUMENT,  I.E.,  F(I)-F(X(I)) 

C  CORRESPONDS  TO  G  (I )  «G  (X  (I) )  .  F  AND  G  MUST  HAVE 

C  BEEN  EVALUATED  AT  EQUALLY  SPACED  X  VALUES. 

C  N  -  DIMENSION  OF  F  AND  G 

C  RANGE  -  RANGE  OF  X  VALUES  OVER  WHICH  F  AND  G  ARE 

C  COMPUTED. 

C 

C  OUTPUT:  XISE  -  INTEGRATED  SQUARED  ERROR 

C  XMAXD  -  MAXIMUM  ABSOLUTE  DIFFERENCE  BETWEEN  F  AND  G 

C  XMSE  -  MEAN  SQUARED  ERROR  ASSUMING  EXPECTATION  TAKEN 

C  WITH  RESPECT  TO  F. 

C 

C************************************************************** 
DIMENSION  F (N) , G (N) 

XMAXD-0.0 
XISE-0.0 
XMSE-0.0 
DO  10  1-1, N 

DIF-(F  (I)  -G  (I) )  *  (F  (l)”G(l)) 

IF (XMAXD.LT. DIF)  XMAXD-D I F 
XISE-XISE+DIF 
XMSE-XMSE+D I F  *F (I) 

10  CONTINUE 

XMAXD-SQRT (XMAXD) 

XISE-XI SE*RANGE/FL0AT (N) 

XMSE-XMSE*RANGE/FLOAT (N) 

RETURN 
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